Page 3 of 5

Re: Map download never finishes [14.5] headless windows

Posted: Fri Jan 27, 2017 9:12 pm
by hansson
At a cursory glance, it seems more likely to be a checksum issue than some deep packet inspection issue, based on the fact that flipping some random byte (does it work with flipping / inserting *any* random byte?) seems to bypass the issue. While the deep packet inspection thing would need to happen to take the specified byte into account when doing its "drop or no drop" logic, the checksum would be completely different when some byte is changed. I would proceed with the advice hatterson has given.

Re: Map download never finishes [14.5] headless windows

Posted: Fri Jan 27, 2017 10:13 pm
by d3phoenix
Network Engineer here.

Consumer grade routers are the bane of my existence.

I'd suggest trying to eliminate them from the picture first and foremost. In my experience, they're the most likely trouble points when you're dealing with strangely mangled or dropped packets in weird patterns like this.


@Admalledd/Brotemkin:

Try temporarily disconnecting everything from your internet connections, and plugging only the machines running Factorio on each end into the ISP's gateway devices. See if anything changes.

You mention that you have SPI disabled, good call. If you haven't already, you should also scour the rest of the firmware and try to disable any "smart" features that are in any way related to UDP traffic -- ALG, Smart NAT, H.323/SIP pass through, VPN passthrough, etc, as bugs in any of these could be a potential culprit.

Re: Map download never finishes [14.5] headless windows

Posted: Fri Jan 27, 2017 10:20 pm
by Voltara
Both 508-byte packets do have the same checksum:

Code: Select all

$ cat pkt1.dat | od --endian=big -w2 -An -tu2 | awk '{sum += $1} END {print (sum + rshift(sum, 16)) % 65536}'
24005
$ cat pkt2.dat | od --endian=big -w2 -An -tu2 | awk '{sum += $1} END {print (sum + rshift(sum, 16)) % 65536}'
24005
I tried generating those 508-byte packets with different combinations of the IP addresses in the screenshot, but didn't get anything really interesting like 0x0000. (Edit: I did this before the pcap files were available, and used the wrong port numbers by mistake. See later post; these checksum to 0xFFFF when both public IP addresses are used.) It's still possible there's another layer of NAT hidden somewhere in the middle though, with an address that would.

The way to confirm or rule out the checksum would be, instead of changing only one byte, to change two 16-bit words of the payload: increment one word, decrement the other. That way, the checksum stays the same. If the resulting packet is also blocked, that's strong evidence that the checksum is indeed involved.

Another thing to test regarding the checksum is whether the same packet (and same port numbers) is blocked in both directions. Because of how the UDP checksum is computed, it doesn't change if you swap the source/destination IP or port numbers.

Other things you can experiment with would be to increment/decrement the port numbers (36473:34197 vs 36474:34196), generating arbitrary packets (try to exercise all 65536 checksum numbers), introducing a 3rd host and generating packets to/from it and the two affected users (attempt to rule out one user or the other.)

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 12:13 am
by riking
Voltara wrote:Both 508-byte packets do have the same checksum:

I tried generating those 508-byte packets with different combinations of the IP addresses in the screenshot, but didn't get anything really interesting like 0x0000. It's still possible there's another layer of NAT hidden somewhere in the middle though, with an address that would.

The way to confirm or rule out the checksum would be, instead of changing only one byte, to change two 16-bit words of the payload: increment one word, decrement the other. That way, the checksum stays the same. If the resulting packet is also blocked, that's strong evidence that the checksum is indeed involved.

Another thing to test regarding the checksum is whether the same packet (and same port numbers) is blocked in both directions. Because of how the UDP checksum is computed, it doesn't change if you swap the source/destination IP or port numbers.

Other things you can experiment with would be to increment/decrement the port numbers (36473:34197 vs 36474:34196), generating arbitrary packets (try to exercise all 65536 checksum numbers), introducing a 3rd host and generating packets to/from it and the two affected users (attempt to rule out one user or the other.)
Or, even easier: Change the header of the packet to two bytes, as the "random byte insert" build does, but don't make it random.

Packet: [Header:2] [Data:X]
Header: [Type:1] [RetryCount:1]

Increment the RetryCount whenever you re-send the packet. That'll change the checksum, without stressing your RNG.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 12:35 am
by BattleshipBrotemkin
d3phoenix wrote:Network Engineer here.

Consumer grade routers are the bane of my existence.

I'd suggest trying to eliminate them from the picture first and foremost. In my experience, they're the most likely trouble points when you're dealing with strangely mangled or dropped packets in weird patterns like this.
We're going to isolate that equipment and try to introduce some carrier diversity when next we get a chance to do some testing. Seeing this pop up in FFF came as a bit of a surprise. We have been picking at the issue over the course of the week and haven't quite gotten to taking apart our respective home networks :)

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 1:05 am
by sabriath
{tcp} NAT punching wont work
Dual-connect punches just fine.
0x0c
Request for map signature is similar to a "cannot reach" signal from backbone routing....change this to 0x80 or more in value as your first byte in block. (all game commands for first byte should be in 0x80-0xff range, just my opinion)

As for the 508 packet loss, I'm honestly at a loss. I suspect that there are several byte patterns that look too similar to TOR or other "illegal" networks within your individual zone, which get blocked automatically by ISP. You can use an XOR hashing of 0x55aa on the block whenever a specific block becomes lost too many times on a case-by-case basis.....since the logic would be part of the "if lost" code, it wouldn't be that much of a drag on the system to test how many times. A 0x80 command would change to a 0xD5, so it would already be a different block signature to let the other side know of the operation required to switch it back.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 1:54 am
by admalledd
To those wondering, I took down the pcaps and logs because simple privacy and such issues. Seems my co-factory manager doesn't mind too much so I have put them back up for anyone else wanting to investigate them as well.

Linky: http://www.admalledd.com/dl/prv/files/f ... g_pcaps.7z

These were captured at our respective host machines.

Working with BattleshipBrotemkin on scheduling and such for doing as many test variances we can. We will keep in mind any of the other suggestions noted for tests to try!

@d3pheonix: will try those! Our current list of things to try (including some from here):

1. Turn off any other possible packet-mangling bits of our respective routers firmware, I already know that some of those are enabled for me because I use VPN pass-through for work sometimes. (for what I use it for I can work around if disabled)
2. Direct-attach our computers to our respective cable modems.
3. Attempt to reduce down the reproduction case by using a dash of python to recreate at a lower level and raw packet writing. Joy!
4. Fix my OpenVPN endpoint at my vpn.home.admalledd.com or jump.admalledd.com and have us both on that for routing our UDP packets.

I think most are probably in the right direction that it is more likely to be some checksum/ether-frame issue, because if it was DPI I would expect the whole flow to be terminated by those points, not just a few packets here and there.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 5:20 am
by lyallp
I have not looked at the network packets as yet but is it possible that you are 'accidentally' forming a TCP/IP packet or some other protocol, given UDP is the basis for TCP?

Does wireshark categorise or identify the 'rogue' packet as something other than what you expect it?

An awesome on-line resource on Internet Protocols can be found at http://www.tcpipguide.com/free/t_toc.htm

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 7:26 am
by d3phoenix
lyallp wrote:I have not looked at the network packets as yet but is it possible that you are 'accidentally' forming a TCP/IP packet or some other protocol, given UDP is the basis for TCP?

Does wireshark categorise or identify the 'rogue' packet as something other than what you expect it?
UDP is not the basis for TCP. UDP and TCP are two different transport (layer 4) protocols that both run on top the network (layer 3) protocol called IP.

Twinsen already confirmed that looking through the traces shows the packets are completely ordinary and totally inline with the rest of the data stream (see the post with screenshots).

The point is, Factorio itself definitely isn't doing anything wrong. This is a problem somewhere on the network, most likely a buggy router, or knowing Comcast, badly-behaving ISP DPI/Throttling equipment.

Not sure what the devs can do other than adding random bytes or an encryption layer into the connection to make sure that none of the devices in the path can drop an identical packet over and over. Both options add overhead. That they're still looking into this after seeing those captures speaks volumes about how amazing the team is :D

Hopefully Admalledd finds something interesting in the next round of testing.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 7:46 am
by Loewchen
d3phoenix wrote: The point is, Factorio itself definitely isn't doing anything wrong. This is a problem somewhere on the network, most likely a buggy router, or knowing Comcast, badly-behaving ISP DPI/Throttling equipment.
The thing is, this issue occurs as well in intra national connections in different countries, so it would have to be something that completely different ISPs on different parts of the world do wrong in a similar way for it to be ISP caused.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 7:50 am
by vtx
All depends the culprit was.

-Checksum : people above me allready suggest good advice.

-Payload : You can trick the network by simply mask those packet after X loss, high enough to not be trigger for normal data loss. Use a new opcode for that masking packet, generate a mask of 4 bytes and place it inside the header. Repeat those 4 bytes for the size of payload then masked_payload = payload XOR extended_mask. Send the masked_payload to client and client will reverse it to the original payload.

It will also change the checksum in the same time.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 11:02 am
by gurka
Twinsen wrote:I included a screenshot and I also included the packets that are being filtered, so maybe you can ask your ISP a few questions of why they are filtering those specific packets.
I re-calculated the checksum both for the first large packet and the small packet. They both (should) have checksum 0x99F2. I don't think that's a coincidence...

Edit: I re-calculated the second large packet also and surprise: the checksum is also 0x99F2.

I'm guessing something, somewhere, doesn't like that checksum.

Edit2: 0x99F2 in network byte order.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 2:08 pm
by opiemonster
The solution is to xor the data with a new key every time the data fails to send. Good luck with trying to find out what isp has the shitty deep packet filtering.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 2:49 pm
by Voltara
gurka wrote:
Twinsen wrote:I included a screenshot and I also included the packets that are being filtered, so maybe you can ask your ISP a few questions of why they are filtering those specific packets.
I re-calculated the checksum both for the first large packet and the small packet. They both (should) have checksum 0x99F2. I don't think that's a coincidence...

Edit: I re-calculated the second large packet also and surprise: the checksum is also 0x99F2.

I'm guessing something, somewhere, doesn't like that checksum.

Edit2: 0x99F2 in network byte order.
And if you take 0x99F2 and adjust it for the NAT, i.e. change 192.168.0.12 (0xC0A8000C) to the public IP 24.21.66.146 (0x18154292), the checksum becomes 0xFFFF:

Code: Select all

99F2 + (C0A8 + 000C) - (1815 + 4292)
FFFF

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 2:52 pm
by Twinsen
Voltara wrote:
gurka wrote: I re-calculated the checksum both for the first large packet and the small packet. They both (should) have checksum 0x99F2. I don't think that's a coincidence...
Edit: I re-calculated the second large packet also and surprise: the checksum is also 0x99F2.
And if you take 0x99F2 and adjust it for the NAT, i.e. change 192.168.0.12 (0xC0A8000C) to the public IP 24.21.66.146 (0x18154292), the checksum becomes 0xFFFF:

Code: Select all

99F2 + (C0A8 + 000C) - (1815 + 4292)
FFFF
Absolutely solid find guys!

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 4:16 pm
by gurka
Nice Voltara! However, what's special with 0xFFFF? The only "special case" with UDP checksum is that if you get a checksum of 0x0000, then you need to actually set it to 0xFFFF, as per RFC768.
If the computed checksum is zero, it is transmitted as all ones (the
equivalent in one's complement arithmetic).
https://www.ietf.org/rfc/rfc768.txt

Anyway, I'm more than certain that the problem here has something to do with the checksum and not with DPI. An extremely short packet (8+5 bytes) getting caught by DPI sounds extremely unlikely.


Edit: Oh wait. Yeah, when you adjust for NAT you actually will get 0x0000 and have to, as stated above, actually set the checksum to 0xFFFF.

I used this code: http://minirighi.sourceforge.net/html/udp_8c.html#a0 when I tested the three packets that were dumped previously in this thread. That algorithm actually incorrectly returns 0x0000 as the calculated checksum. My guess is that either your client code or your server code also incorrectly sets the checksum to 0x0000 instead of 0xFFFF.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 4:34 pm
by sillyfly
gurka wrote: My guess is that either your client code or your server code also incorrectly sets the checksum to 0x0000 instead of 0xFFFF.


That would be the OS or, at the very least some library they're using - I can't imagine Wube have re-written UDP implementation. And anyway, as it is only outside the NAT that this happens, it can't be in one of the endpoints, so it can't possibly be something wrong with Factorio code.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 4:49 pm
by Twinsen
gurka wrote:My guess is that either your client code or your server code also incorrectly sets the checksum to 0x0000 instead of 0xFFFF.
You are also forgetting that this is before the NAT, so the PCs do not deal with this. It's looking like a bad router or bad hardware at the ISP. I'll put my money on a bad router. I'm curious what model it is.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 5:18 pm
by gurka
Twinsen wrote: You are also forgetting that this is before the NAT, so the PCs do not deal with this. It's looking like a bad router or bad hardware at the ISP. I'll put my money on a bad router. I'm curious what model it is.
Oh, right. Well, a solution is to roll your own checksum and set UDP's checksum to zero. It's optional in IPv4 at least. But if the bad hardware incorrectly calculcates the UDP checksum it might also incorrectly handle UDP packets with not set / zero checksum as well...

Edit: It doesn't seem to be too uncommon
NAT implementations continue to improve, but there are still many NAT bugs lurking in home routers. The most general problem is corruption of a packet that has been NAT translated. The end result is that the packet will be transmitted with a bad IPv4, TCP, or UDP checksum, depending on where the corruption occurs.
http://www.qacafe.com/knowledgebase/common-router-bugs/

I don't know how other softwares handles this. Maybe, as I think someone also have mentioned earlier, both the client and server could try to retransmit packets that seems to have dropped, with extra (non-zero) padding. So that the checksum does not hit 0x0000 / 0xFFFF.

Re: Map download never finishes [14.5] headless windows

Posted: Sat Jan 28, 2017 5:54 pm
by AyrA
gurka wrote:[...]. So that the checksum does not hit 0x0000 / 0xFFFF.
Checksum 0x0000 is special because it could also indicate, that the checksum has been offloaded by the network card. If this mismatches the setting of your network card, the card might drop the packet even before wireshark or another sniffer could potentially see it. Users who have this issue could try this (requires local administrative privileges):

1. Hit WIN+R on your keyboard
2. Type "devmgmt.msc" and hit ENTER or press OK
3. Expand the node "Network adapters" and double click on the network card you use for internet.


Note: The options below could be missing for your network card or have different names. In any way you should note how they were before you changed them in case you want to revert the changes later.


4. In the "Advanced" tab, search for "IPv4 checksum offload" and enable for receive (rx) and send (tx)
5. Repeat Step 4 for "TCP checksum offload" and "UDP checksum offload"
6. Click "OK" and restart your computer. Check if the issue is still present.

Note: If the options were already enabled, disable them. Your card might interpret them in a "negated" way, especially when the driver has been poorly translated.
===

To the devs:
To generally mitigate this problem in the future, add a 16bit counter to the UDP packets your applications send, this way, you automatically change the checksum for each packet that is transmitted again.