Page 1 of 1

Odd lag in MP

Posted: Wed Jun 14, 2017 10:35 pm
by fwyrl
Yes, I did read the list of possible reasons for MP lag here: viewtopic.php?f=49&t=4400

This is not that, the host and all clients are capable of running my factory in SP just fine, and we all have > 1 MB/s up and down (I have 50 MP/s up and down, but some of them are on slightly slower connections).

I have a fairly large SP factory, started around .13, which I've played a few hundred hours on. It's not too intensive yet, and if I drop below 50 FPS/40 UPS it's because I nuked something near a lot of biter bases and they're swarming towards me. However, when I load the map as MP I am fine, I run fine, it's a tad slower, but not very noticeable. However, if anyone else joins, not matter how powerful their PC, they instantly are brought to 3FPS/3 UPS, or lower. After about 5 minutes, this clears up, and they are simply playing with about half a second of latency. However, this lag comes back after another 15-20 minutes, and again stays for almost 5. I do not experience any lag during these sessions, only the other player does.

It gets more interesting though. when they host (no matter which one it is, I've tried about 6 players from across the continental united states, with vastly different speed connections and PCs), they have absolutely no lag issues, and when I join them, I am reduced to 20 FPS/15 UPS. This persists for several minutes, and then vanishes, leaving me just playing as if I have a ping of around 500 again, like the others. I was not able to play on any one host long enough to see if the second wave of lag existed for me, as they all got bored, unfortunately.

I did check if this was related to network speed, but all of them reported that Factorio wasn't even using close to the amount of network I/O that was available to it (not being used by other programs).

In summery: The factory runs fin in SP, and for the host. However, if anyone joins it from a different computer, they suffer crippling lag (< 3 FPS/UPS) for several minuets, and then play on (what is for my network) massive latency (500 ms, give or take). The only time this is not as bad is if I, the original creator of the map, joins one of them. I still suffer terrible lag and bad ping, but I do not loose nearly as much FPS/UPS.

If you would like, I can post the map download somewhere (If so, where should I post it? It is 85.5 MB). I can also replicate this issue and try to get logs from my friends to post here if you would need them. If you'd like me to test anything with this map or these players, please let me know, I'd like to track down whatever causing this lag, whither it be a bug (Not 100% sure it is, hence why I posted in technical help, rather than bugs), or something else.

Edit: I just realized I should post my system specs:
RAM: 16 GB DDR3 (Yeah, I know it's a bit slow)
CPU: i7 6700HQ Skylake
GPU: Nvidia GTX 965m
OS: Win 7 Pro, 64-bit
Internet connection is through LAN, and I'm playing Vanilla factorio here.

Re: Odd lag in MP

Posted: Fri Jun 16, 2017 10:04 am
by AlienX
Could you upload your save?
I could test this myself with a few mates on my headless to see if it's able to be reproduced.

Re: Odd lag in MP

Posted: Sat Jun 17, 2017 3:29 am
by bk5115545
So this does sound odd but it might be networking related instead of game-code related.

Just to clarify:
The host never suffers lag.
Anyone who hosts never suffers lag.
Anyone who connects to any host ALWAYS suffers from ~500ms ping except when the game is running slower than 60 FPS/UPS. (this is the odd part)
The host pretty much always runs at 60 FPS/UPS.

Things to check:
Verify that the host isn't triggering some DDOS prevention (since Factorio uses UDP packets). This could be in a virus scanner, firewall, network driver, router, modem, or ISP. This hasn't really been a problem since the async-multiplayer came out and it's less likely the issue when there are multiple potential hosts with different hardware/ISPs but it's still possible. Usually it's a Netgear or Asus router that's the problem. You can find the checkbox to disable DDOS protection somewhere under the security settings. If this isn't the issue make sure to turn it back on though as it makes the internet a safer place :D.
Verify that the host isn't reaching their max upload bandwidth. You can check what they're sending in Win 7/8.1/10 through the Resource Monitor (it's the "Send (B/Sec)" column). In most parts of the world, upload bandwidth is a lot lower than download bandwidth; a 60/10 connection means 60mbps down and 10mbps up. Notice this is a little "b" so we need to divide by 8 to get the big "B." For reference, a 60/10 connection is about 8 MB/s down and 1 MB/s up.

Questions:
What Factorio version are you and your mates running?
When you press F5 (I think, maybe F4) to open the timings debug screen (a lot of white text), how big is your latency-hiding buffer (top right number) when the game is running at normal speed with high ping and how big is it at ~3 FPS/UPS? What does the host see as your number at both these times?

I'm not a dev but these answers will clarify if this issue is more suited as a bug report.

Re: Odd lag in MP [.15.x]

Posted: Sat Jun 17, 2017 7:37 am
by fwyrl
bk5115545 wrote:So this does sound odd but it might be networking related instead of game-code related.

Just to clarify:
The host never suffers lag.
Anyone who hosts never suffers lag.
Anyone who connects to any host ALWAYS suffers from ~500ms ping except when the game is running slower than 60 FPS/UPS. (this is the odd part)
The host pretty much always runs at 60 FPS/UPS.

Things to check:
Verify that the host isn't triggering some DDOS prevention (since Factorio uses UDP packets). This could be in a virus scanner, firewall, network driver, router, modem, or ISP. This hasn't really been a problem since the async-multiplayer came out and it's less likely the issue when there are multiple potential hosts with different hardware/ISPs but it's still possible. Usually it's a Netgear or Asus router that's the problem. You can find the checkbox to disable DDOS protection somewhere under the security settings. If this isn't the issue make sure to turn it back on though as it makes the internet a safer place :D.
Verify that the host isn't reaching their max upload bandwidth. You can check what they're sending in Win 7/8.1/10 through the Resource Monitor (it's the "Send (B/Sec)" column). In most parts of the world, upload bandwidth is a lot lower than download bandwidth; a 60/10 connection means 60mbps down and 10mbps up. Notice this is a little "b" so we need to divide by 8 to get the big "B." For reference, a 60/10 connection is about 8 MB/s down and 1 MB/s up.

Questions:
What Factorio version are you and your mates running?
When you press F5 (I think, maybe F4) to open the timings debug screen (a lot of white text), how big is your latency-hiding buffer (top right number) when the game is running at normal speed with high ping and how big is it at ~3 FPS/UPS? What does the host see as your number at both these times?

I'm not a dev but these answers will clarify if this issue is more suited as a bug report.
Why would the host always be at 60 FPS/UPS? I am usually at about 40, because it's a big enough factory that it's not going to run at 60/60.

As I mentioned, I have checked with each host that they are not capping internet in or out. I can't check DDOS prevention, as no one I play with has access to their router. However, this would only create connection lag. The client-side lag is the big issue here.

As I said in my post, I have 50 megabytes per second (Bytes, not Bits) up AND down. Factorio is not using that, and I am running no other programs on the net. I am also on a LAN cable, removing the bottleneck of the wifi router and card.

Is there any reason why non-hosts always suffer .5 seconds of movement lag? That seems rather odd.

We're running .15.x, good catch there, forgot not everyone's on 15 yet.

We can't do normal speed, high ping. No other map gives connection lag or FPS/UPS lag. I can either have low ping (good, responsive connection feels like <20ms ping), or I can have nuked FPS/UPS and a really horrid connection issue to boot. The latency hiding number on our map was passing 600 for clients, and not even passing 30 for the host, iirc.


Above all: could you clarify how network lag makes clients have trouble rendering the game, or updating things client side? Since Factorio is totally deterministic, the only items that must be taken into account are other player's inputs. Client side computers should simulate at about the same speed when connecting to the map in MP as in SP, as I understand it.

AlienX wrote:Could you upload your save?
I could test this myself with a few mates on my headless to see if it's able to be reproduced.
Save attached here. Sorry it's so big. I tried 7zip-ing it, but that only cut off half a megabyte, so I figured it was better to use the file in my save folder than a strange extension that might make the forums upset.

Re: Odd lag in MP

Posted: Sun Jun 18, 2017 3:26 am
by bk5115545
fwyrl wrote: Why would the host always be at 60 FPS/UPS? I am usually at about 40, because it's a big enough factory that it's not going to run at 60/60.
I was just asking because it wasn't clear if everyone who hosted ran at about 40 or if it was just you. Sounds like it's a problem for a lot of hosts. This helps narrow issues down a bit (points to game code as it affects multiple hosts).
fwyrl wrote: As I mentioned, I have checked with each host that they are not capping internet in or out. I can't check DDOS prevention, as no one I play with has access to their router. However, this would only create connection lag. The client-side lag is the big issue here.
See the funny thing is that UDP is a connection-less protocol so there isn't any form of minimized-sliding-window that affects average round-trip-time. By client-side lag I'm assuming you mean the 0.5s delay between pressing move and actually moving. The high ping is mostly the cause of this.
fwyrl wrote: As I said in my post, I have 50 megabytes per second (Bytes, not Bits) up AND down. Factorio is not using that, and I am running no other programs on the net. I am also on a LAN cable, removing the bottleneck of the wifi router and card.
Thanks for clarifying that. It's an unusually fast connection (business/university internet?).
fwyrl wrote: Is there any reason why non-hosts always suffer .5 seconds of movement lag? That seems rather odd.
Even with the latency hiding, you should only really notice the high ping if any player is shooting (or just holding down space or 'c') as this disables the latency hiding. I know what you're seeing is different and that kind of points to a bug in the game.
fwyrl wrote: We can't do normal speed, high ping. No other map gives connection lag or FPS/UPS lag. I can either have low ping (good, responsive connection feels like <20ms ping), or I can have nuked FPS/UPS and a really horrid connection issue to boot. The latency hiding number on our map was passing 600 for clients, and not even passing 30 for the host, iirc.
Ohhhh jackpot (maybe). Was the latency hiding buffer about the same for all clients (regardless of real-world location and hardware)?
600 ticks behind points to the client running almost exactly 10s behind the host and is the default delay for Cisco, Netgear, and Checkpoint DDOS protection. If everyone is about at 600 +- 200 then super high packet loss (because of protections) might be the issue.
It's also possible that everyone's router overheats at exactly the same time but that's highly unlikely.
fwyrl wrote: Above all: could you clarify how network lag makes clients have trouble rendering the game, or updating things client side? Since Factorio is totally deterministic, the only items that must be taken into account are other player's inputs. Client side computers should simulate at about the same speed when connecting to the map in MP as in SP, as I understand it.
Sure thing. As far as I can tell, a single player game is basically just a multiplayer game with the network port closed. It really matters a lot that Factorio is totally deterministic because that determines what order those inputs are aggregated, sent off, and then integrated. I believe that you are triggering some sort of DDOS protection (or experiencing packet-shaping). Since the game uses UDP packets with custom made reliability (probably the best multiplayer implementation I've seen in awhile), if a UDP packet is lost then the data is just accumulated at the front of the next UDP packet and will eventually get to the client. The issue comes in when we run into rate-limiting (DDOS protection or packet-shaping) and the UDP packets start to fill up. As the UDP packets approach larger sizes (they can be up to about 64kb), the client will have periods of a lot of inputs to integrate (covering the next couple of frames or even a full second) followed by periods of nothing to integrate (packets lost via shaping or DDOS protection). I suspect that the game can't integrate accumulated inputs fast enough and the "input buffer" that your CURRENT button presses go into, before being integrated to your local game and sent off, fills up. This would also make the host get a lower FPS/UPS because they have to track all the UDP packets that didn't arrive (this is done by the network driver/card with TCP) and then recreate the lost packets and add the current inputs to it. It's a lot of math that normally doesn't need to happen and definitely doesn't happen in single player.

Oh and while the game is totally deterministic, there are a bit of other things synced that aren't user inputs :D. There isn't a list that I know of but there is a desync detection that relies on hashing so at least hashes are synced too.
TL;DR. You can't render a frame if you haven't gotten the other players inputs for it yet (where do the players get rendered?). This is mostly fixed by the latency hiding implementation but there are some networking quirks that may interact weirdly.

Something for the devs to look at is if there is some other reason for the slowdown. Maybe trains or circuit network states are also synced for all I know.
Can you get a screenshot of the timings debug screen for the host and a lagging client? It would give us some hints as to what could make the game smoother.

I don't think this will fix your problem ... but it's worth a try. Force a higher latency hiding buffer than the dynamically allocated one. (Use a text editor that recognizes Linux new lines (not notepad)).
In %appdata%\Factorio\config\config.ini (around line 21 for me) set minimum_latency_in_ticks=700 and remove the preceding ';' to uncomment the line. You could try some ridiculous values if you really wanted but higher values will make combat just god awful.

Have you tried hosting with the headless server? This is also worth a try.

Re: Odd lag in MP

Posted: Sun Jun 18, 2017 4:04 am
by bk5115545
Alright I loaded up your save and holy crap.

25k drones?
Also all of those biters just roaming around the nests are a cause of a few UPS.

When I removed all biters or all drones then I could run the game at 60 UPS (I started at 54).

The entity update is taking way too long (nearly 100% of frame budget). It might also be all the turrets searching for targets to shoot at (I'm not sure how they're implemented).

One thing is certain though; this save is perfect for Rseding to use for an optimization pass and it might be perfectly suited for the multi-threading changes to be tested on as well.
The thread is a bit old but it looks like he still checks it. Make sure to mention that multiplayer is terrible for some odd reason and I bet he will look at it.
viewtopic.php?f=5&t=17501

Re: Odd lag in MP

Posted: Sun Jun 18, 2017 5:38 am
by fwyrl
bk5115545 wrote: I was just asking because it wasn't clear if everyone who hosted ran at about 40 or if it was just you. Sounds like it's a problem for a lot of hosts. This helps narrow issues down a bit (points to game code as it affects multiple hosts).
Yeah, every host runs at well under 60 UPS/FPS. It's because the save is massive though, nothing unusual there.
bk5115545 wrote: See the funny thing is that UDP is a connection-less protocol so there isn't any form of minimized-sliding-window that affects average round-trip-time. By client-side lag I'm assuming you mean the 0.5s delay between pressing move and actually moving. The high ping is mostly the cause of this.
By client-side lag I mean UPS-FPS lag. I say connection lag when I mean ping issues.
bk5115545 wrote: Thanks for clarifying that. It's an unusually fast connection (business/university internet?).
College apartment.
bk5115545 wrote: Ohhhh jackpot (maybe). Was the latency hiding buffer about the same for all clients (regardless of real-world location and hardware)?
600 ticks behind points to the client running almost exactly 10s behind the host and is the default delay for Cisco, Netgear, and Checkpoint DDOS protection. If everyone is about at 600 +- 200 then super high packet loss (because of protections) might be the issue.
It's also possible that everyone's router overheats at exactly the same time but that's highly unlikely.
They were all over the place, but probably within that range. That explains the connection lag at least.
bk5115545 wrote: Sure thing. As far as I can tell, a single player game is basically just a multiplayer game with the network port closed. It really matters a lot that Factorio is totally deterministic because that determines what order those inputs are aggregated, sent off, and then integrated. I believe that you are triggering some sort of DDOS protection (or experiencing packet-shaping). Since the game uses UDP packets with custom made reliability (probably the best multiplayer implementation I've seen in awhile), if a UDP packet is lost then the data is just accumulated at the front of the next UDP packet and will eventually get to the client. The issue comes in when we run into rate-limiting (DDOS protection or packet-shaping) and the UDP packets start to fill up. As the UDP packets approach larger sizes (they can be up to about 64kb), the client will have periods of a lot of inputs to integrate (covering the next couple of frames or even a full second) followed by periods of nothing to integrate (packets lost via shaping or DDOS protection). I suspect that the game can't integrate accumulated inputs fast enough and the "input buffer" that your CURRENT button presses go into, before being integrated to your local game and sent off, fills up. This would also make the host get a lower FPS/UPS because they have to track all the UDP packets that didn't arrive (this is done by the network driver/card with TCP) and then recreate the lost packets and add the current inputs to it. It's a lot of math that normally doesn't need to happen and definitely doesn't happen in single player.

Oh and while the game is totally deterministic, there are a bit of other things synced that aren't user inputs :D. There isn't a list that I know of but there is a desync detection that relies on hashing so at least hashes are synced too.
TL;DR. You can't render a frame if you haven't gotten the other players inputs for it yet (where do the players get rendered?). This is mostly fixed by the latency hiding implementation but there are some networking quirks that may interact weirdly.

Something for the devs to look at is if there is some other reason for the slowdown. Maybe trains or circuit network states are also synced for all I know.
Can you get a screenshot of the timings debug screen for the host and a lagging client? It would give us some hints as to what could make the game smoother.

I don't think this will fix your problem ... but it's worth a try. Force a higher latency hiding buffer than the dynamically allocated one. (Use a text editor that recognizes Linux new lines (not notepad)).
In %appdata%\Factorio\config\config.ini (around line 21 for me) set minimum_latency_in_ticks=700 and remove the preceding ';' to uncomment the line. You could try some ridiculous values if you really wanted but higher values will make combat just god awful.

Have you tried hosting with the headless server? This is also worth a try.
Does that forced latency hiding work? I tried to bump it a week or so ago, and just came across a lot saying it could not be done anymore.
I don't really mind horrifying combat, because it's just nuke/run away anyways at this point. I could be 10 seconds behind and it would still be the same.
So if the game's lagging because the anti-ddos is preventing sync data from reaching it, then why does it stop lagging after about 5 minutes, then start up after a while?

Will try hosting on headless.

Re: Odd lag in MP

Posted: Mon Jun 19, 2017 12:49 am
by bk5115545
So I've never tried the forced latency hiding. I found it in like the 14.18 patch notes and haven't seen anything about it since.
If it does work, it's just going to give the game more time to receive the missing packet data so it's not reeeaaaally a solution to the problem.

I used to have college internet... I can assure you that you are under a pretty strict QoS policy and behind at least 1 layer of packet shaping. Both of these very strongly affect UDP.
It's highly probable that the games UDP packets simply have a very low priority (as UDP is customarily used for data that doesn't need reliability and is considered to be OK to be dropped). At my University in Arkansas I could just go to the networking tech and ask for a higher packet priority on port #### and he would usually do it without question.

As to why there are periods of GOOD and then BAD, cheap QoS works on a sliding window (through time) of packet counts on a per port or per IP basis. It's possible that everything is good until your traffic reaches some critical percentage of all the packets and then you get consistently dropped until the window moves past the "good-zone." Then the QoS implementation sees you as being dropped a lot more often than the other traffic so you get a priority bump and everything is good again until you reach that critical point.

It's also possible there is some other issue with how the game is creating packets with this much data to sync (maybe they're huge and get fragmented). Rseding91's optimization thread would be the place to go for those answers.