[14.3] Unable to "Catch Up" with capable hardware

Post all other topics which do not belong to any other category.
kogimus
Manual Inserter
Manual Inserter
Posts: 4
Joined: Sat Sep 03, 2016 10:48 pm
Contact:

[14.3] Unable to "Catch Up" with capable hardware

Post by kogimus »

So, my friends and I are trying to play Factorio multiplayer and we've run into an interesting issue.

It would seem that if your server is running on a high-specification machine (http://www.userbenchmark.com/UserRun/1653954), similarly high specification machines will not be able to connect :
My machine : http://www.userbenchmark.com/UserRun/1653805
the other user was not able to perform a benchmark at the time because after performing it, he had to go put on daddy-pants and watch the baby, his machine is similarly spec'd, but is built around an i5-4690.
Client can connect, successfully download the map, then when the "Catching Up" phase is reached, it stays steady for a little while, maybe moves up a few ticks, then steadily goes down until disconnect with the message stating that you weren't fast enough to catch up.

On the other hand, a SIGNIFICANTLY slower / lower specification machine was relatively easily able to connect to the same server using the same WAN connection, although the "catching up" phase took between 60 and 120 seconds.
http://www.userbenchmark.com/UserRun/1653807

The working theory we came up with was that factorio's netcode doesn't take into account higher end systems, and the server isn't expecting heartbeat/tick responses as fast as they are being returned, when the server is running on a high end system, and the clients connecting are also fairly high end machines.

To test this theory, we mirrored the server's state and configuration information to a slower system (dual Xeon 5610s 8GB DDR2-5300F RAM in a Dell Poweredge 2950 2U rackmount), and everyone was able to connect without issue, further indicating that this might be the cause.

attached are the serverside logs from our unsuccessful testing of the server on the Intel Core i7-6950X based system.
**EDIT**
It was brought to my attention by the server admin that it would be worthwhile to mention that the factorio server is running in a guest VM with 4 cores and 8GB RAM, hosted on one of the Mushkin SSDs, host OS for the VM is Windows 7, Guest OS for the factorio server is Fedora 24, virtualization is bein gdone with Oracle VirtualBox 5.1.4r110228, system is configured to have VT/x and VT/d enabled, VM is configured to take advantage of both.
Attachments
factorio-previous.log
(26.96 KiB) Downloaded 202 times
factorio-current.log
(12.63 KiB) Downloaded 183 times

inetknght
Burner Inserter
Burner Inserter
Posts: 14
Joined: Fri Sep 09, 2016 5:06 am
Contact:

Re: Unable to "Catch Up" when server and clients are too fast.

Post by inetknght »

Also, we're using Factorio 14.3 headless server. We don't want to play on the Dell Poweredge because its available WAN bandwidth is significantly lower and because its a significantly less power machine.

Loewchen
Global Moderator
Global Moderator
Posts: 8308
Joined: Wed Jan 07, 2015 5:53 pm
Contact:

Re: Unable to "Catch Up" when server and clients are too fast.

Post by Loewchen »

Please upload the log of the connecting client.

kogimus
Manual Inserter
Manual Inserter
Posts: 4
Joined: Sat Sep 03, 2016 10:48 pm
Contact:

Re: Unable to "Catch Up" when server and clients are too fast.

Post by kogimus »

Attached is the requested log.

The machine the test was performed using was the same relatively high end setup linked above, if that matters.
Attachments
factorio-current.log
(9.49 KiB) Downloaded 225 times

Loewchen
Global Moderator
Global Moderator
Posts: 8308
Joined: Wed Jan 07, 2015 5:53 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Loewchen »

In general Factorio uses only the cpu and only one core of it to catch up to the server after map download. The server hardware has no influence other than, that insufficient cpu power would make the game run below 60 ticks. The only metric for the ability of a client to catch up to a 60 tick game is its single core flops power. It seems to me that you are running a map that exactly hits the window in which the i5-6300U is just fast enough while the FX-8350 is no longer sufficient.
Therefore I consider this not a bug.

inetknght
Burner Inserter
Burner Inserter
Posts: 14
Joined: Fri Sep 09, 2016 5:06 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by inetknght »

Loewchen,

You're telling us that it's not a bug? Are you going to tell us that this is intended?

Because that is 100% purely unacceptable. Because, quite frankly, if the server's hardware is capable of staying at 60 ticks per second, then how could any client ever connect? The client would always have to have equal or faster hardware in comparison to the server.

There are thousands of games where this type of problem simply does not happen. So the fact that this happens in this game shows poor design. Stating that a show-stopping problem is not a bug is very much not conducive to a company which wants people to buy the game. If we didn't have old hardware lying around, this game would be unplayable for my friends and I, and we would be asking for refunds.

I see very little reason that there should not be an server-side option "slow down to slowest client", as a worst case scenario, or a configuration that states how many ticks per second to play at (instead of 60).

As a fellow software engineer, I also see very little reason that the game should not be able to take advantage of multiple cores.

kevrlet
Manual Inserter
Manual Inserter
Posts: 1
Joined: Sun Sep 11, 2016 2:28 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by kevrlet »

Along the similar lines, and contributing to the overall problem, why in god's name is the game sending the entire map at login? Especially (seemingly) uncompressed? Imagine, if you will:

- A server that is busier than just 3 people
- A server that has been online more than two weeks
- A server populated by players who like to expand, instead of fortifying a single position

All of these things will result in a map that balloons in size far quicker than ours does. Now, the larger the map the better the server hardware/network connection required to transfer the map in a reasonable time frame to multiple players. This is pretty important because the longer the map takes to transfer, the more likely you are to fall victim to network hiccoughs forcing you to reconnect and re-download the entire map. Now, our map is only about 40MB and already takes several minutes. If we continue to play this game for any real length of time, that could very easily reach over 100MB...at which point, I would no longer expect anyone to be able to connect.

I do find myself wondering, who decided this was good design? Minecraft never had this particular problem, even at the advent of large servers. Would one not think learning a lesson from one of the giants of creative games a prudent choice?

I also have my concerns about whether this game can even handle a well-established, large server. There are performance issues during normal play as-is, and as mentioned, ours is not a taxing community.

kovarex
Factorio Staff
Factorio Staff
Posts: 8078
Joined: Wed Feb 06, 2013 12:00 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by kovarex »

kevrlet wrote:Along the similar lines, and contributing to the overall problem, why in god's name is the game sending the entire map at login? Especially (seemingly) uncompressed? Imagine, if you will:

- A server that is busier than just 3 people
- A server that has been online more than two weeks
- A server populated by players who like to expand, instead of fortifying a single position

All of these things will result in a map that balloons in size far quicker than ours does. Now, the larger the map the better the server hardware/network connection required to transfer the map in a reasonable time frame to multiple players. This is pretty important because the longer the map takes to transfer, the more likely you are to fall victim to network hiccoughs forcing you to reconnect and re-download the entire map. Now, our map is only about 40MB and already takes several minutes. If we continue to play this game for any real length of time, that could very easily reach over 100MB...at which point, I would no longer expect anyone to be able to connect.

I do find myself wondering, who decided this was good design? Minecraft never had this particular problem, even at the advent of large servers. Would one not think learning a lesson from one of the giants of creative games a prudent choice?

I also have my concerns about whether this game can even handle a well-established, large server. There are performance issues during normal play as-is, and as mentioned, ours is not a taxing community.
Thank you for the suggestion to change the network model, but we are not planning to do so. Minecraft is completely different game than Factorio, which requires different approach. The model of, update what you see and what changes might work "kind of fine" in the case of Minecraft, but in Factorio, where all the factory changes all the time, it just couldn't work at all.

inetknght
Burner Inserter
Burner Inserter
Posts: 14
Joined: Fri Sep 09, 2016 5:06 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by inetknght »

Mind you, I don't necessarily think that the netcode is specifically the problem. Instead, I don't believe that the game needs to be single-threaded. And, ultimately, limiting the game to a single core is what's making a processor's single-core performance so important, is it not?

So I'm very curious about why you think it "just couldn't work at all". Do you have some not-so-light reading on the architecture of Factorio's code? Or, hell, the code itself?

orzelek
Smart Inserter
Smart Inserter
Posts: 3911
Joined: Fri Apr 03, 2015 10:20 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by orzelek »

inetknght wrote:Mind you, I don't necessarily think that the netcode is specifically the problem. Instead, I don't believe that the game needs to be single-threaded. And, ultimately, limiting the game to a single core is what's making a processor's single-core performance so important, is it not?

So I'm very curious about why you think it "just couldn't work at all". Do you have some not-so-light reading on the architecture of Factorio's code? Or, hell, the code itself?
Kovarex is the main dev.. so my guess would be he knows :D
It was also explained quite few times here on forums and in some FFF's.

Rseding91
Factorio Staff
Factorio Staff
Posts: 13204
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Rseding91 »

inetknght wrote:There are thousands of games where this type of problem simply does not happen. So the fact that this happens in this game shows poor design. Stating that a show-stopping problem is not a bug is very much not conducive to a company which wants people to buy the game. If we didn't have old hardware lying around, this game would be unplayable for my friends and I, and we would be asking for refunds.
I challenge you to name 1 that does what Factorio does at the scale it does. Unless someone has been developing in secret for many years there isn't any other game that does what Factorio does.
If you want to get ahold of me I'm almost always on Discord.

inetknght
Burner Inserter
Burner Inserter
Posts: 14
Joined: Fri Sep 09, 2016 5:06 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by inetknght »

orzelek wrote:Kovarex is the main dev.. so my guess would be he knows :D
Good info to know.
orzelek wrote:It was also explained quite few times here on forums and in some FFF's.
I'm not the original poster. Just a friend of his. So it's unfortunate there wasn't a whole lot of research done on the subject. But now that we're here, I did a search for catch up (with 70 pages...) as well as catching up (with 10 pages). It would seem to me that the problem is well complained about. I would think that would indicate that it should not be not a bug.

There are 155 Friday Facts; how many of those deal with the problems we're encountering? How many of them are relevant to the current version of Factorio? Frankly, I don't have time to sift through that much noise. I was hoping someone would know already which forum threads or blog posts are the most relevant. I'll do the reading, I promise, I just need the signal-to-noise ratio improved.
Rseding91 wrote:I challenge you to name 1 that does what Factorio does at the scale it does. Unless someone has been developing in secret for many years there isn't any other game that does what Factorio does.
I'm going to assume that you mean to say that Factorio has a large scale. So... just about any real time strategy game does what Factorio does. Starcraft, Command & Conquer, Railroad Tycoon. Simulation games like SimCity does as well. First person shooters do this. I remember an old game I used to play, The Settlers II. It did exactly what Factorio does... and on a 486... on about a 1/100th scale.

I'd bet you'd love to see a game called EVE Online where literally _thousands_ of players end up playing together; the server needs to simulate hundreds of thousands of weapons firing, damage being processed, hit chances, AI movement, and even ship movement... and not a conveyor belt where movement is consistent in speed and direction but rather in space with dynamic speeds. And CCP has done very well to combat the actual problems by implementing solutions instead of denying that they're problems.

How about Planetside 2 where you get a few hundred people running around throwing rockets, tanks, and aircraft at each other?

Minecraft for example has performance issues when you reach tens or hundreds of players, not three.

Anyway, a common theme in these games is that if a single client cannot keep up then either the entire game server slows down or else the client gets dropped if it can't keep up a minimum. Sometimes there's no minimum as long as the client is responding to network requests; have you ever tried playing Railroad Tycoon 3 with three people, six AIs, and thousands of trains? I have. It's pretty fun when the cities are expanding.

In these games, it's not often that a client cannot keep up. Why is that? You can't tell me that it's because the client hardware is somehow better than the server hardware. Sure, maybe in pure per-core clock speed (even an i5 can have better clock speed than a Xeon), but definitely not performance per socket.There's only so much math that a single core can provide. So where the i5 gains in per-core, the Xeon gains in total number of cores. That i5? It has two or four cores. Xeons start at six. I work with machines that have 22 cores per socket and eight sockets for a whopping 176 total cores available to a single operating system.

Since a single core can't do all the math on its own in a reasonable amount of time, we have to look at how to get the math, that has to be done, out of sequence. If we can get it out of sequence (and rebuild it in sequence later), then we can offload each part to some other execution unit; whether it's another core, or to (for example) the GPU doesn't totally matter. Actually, it does, but not as much as you might think. In fact, this concept is so important that there are entire generic software platforms built around it.

It can be done, even for Factorio. Minecraft has your example. In Minecraft you end up with many chunks. The chunks get generated on-the-fly. They also get downloaded on-the-fly. By the way, you can split your physics processing per-chunk, too. And your synchronizations? Yeah you can checksum the chunks instead of the whole map.

There's no reason you can't other than "we're not planning to do that". And frankly, that's what I'm very disappointed about. Factorio is a great game. But it's unfortunate that this problem isn't being solved. It's unfortunate that it's not only not being solved but I'm being told it's not a problem. Because you're literally denying that it's a problem that people who have recent and powerful computers are unable to play with each other. That statement is so crazy to us that we're literally bewildered.

The model of "update what you see and what changes" does work: many games have already proven it. Minecraft is one example and it's definitely not the first. EVE Online is another example; it's been around for a fair bit longer than Minecraft. Planetside, Unreal Tournament, Half-life... DNA analysis of hundreds of gigabytes, with trillions of individual data points, is done that way: you simply cannot fit that much data in a single computer. Bank processing is done that way; you simply cannot trust that much data to be in a single computer. The New York Stock Exchange? It handles trillions of calculations per second and, trust me, it's not done on one single computer.

As I mentioned earlier in this post, it works so well that there's generic software platforms that are built around the idea of map/reduce. The idea that you can split a single "work unit" into multiple smaller work units and distribute them across "nodes"; the node can be as large as a datacenter or as small as a single thread of execution. It's all about figuring out what your unit-of-work is, and splitting it up. In games, it's as "simple" as chunking the map.

Yeah it's actually not simple. Yeah it's hard to get right. But, trust me, it solves a lot of problems. Implementing the software architecture, the abstraction, which is necessary to be able to do it can be difficult. Diffusing work bombs, solving deadlocks, determining bottlenecks... these are hard problems to solve. But solving them enables your software, Factorio, to actually be playable in multiplayer with people who've recently paid large sums of money to be able to play games that tax the hardware way more than Factorio does. And, hey, it also enables people to play with some fairly gigantic factories. That's a win for everyone.

That statement is so crazy to us (my friends and I) that we're literally bewildered. As a software engineer, I can assume that maybe there's a lack of developer experience here. Lacking experience isn't a bad thing. We all learn from somewhere. So let's seriously address it instead of denying that a problem is even a problem.

Or maybe there's a misunderstanding. Maybe the statement was meant to be this isn't a bug but we do plan on fixing it, just not right now. Putting a fix off for later is fine, but just acknowledge that it's a problem and that you want to fix it. I'd certainly love for a time estimate as well, but we all know how estimates are sometimes difficult to stick.

User avatar
Smarty
Global Moderator
Global Moderator
Posts: 816
Joined: Sat Oct 04, 2014 5:00 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Smarty »

Quick trick you can do if someone is joining pause the game and keep it paused until the player has downloaded the map and did the catching up part it will be much faster because on large maps i have the same problem with my FX-8350

Loewchen
Global Moderator
Global Moderator
Posts: 8308
Joined: Wed Jan 07, 2015 5:53 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Loewchen »

Moved to general discussion...

Loewchen
Global Moderator
Global Moderator
Posts: 8308
Joined: Wed Jan 07, 2015 5:53 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Loewchen »

inetknght wrote:I did a search for catch up (with 70 pages...) as well as catching up (with 10 pages). It would seem to me that the problem is well complained about. I would think that would indicate that it should not be not a bug.
Your search actually listed all comments (not topics) containing the word "catch" (not "catch up", as short words get ignored by the search) within the whole forum since its creation. If you limit your search to support topics with the whole phrase, you get 2 hits (not pages, single hits) one of which is this one.
In general I have not seen an automatic system that can determine if an issue qualifies as a bug, and I think using the number of search results as a metric is fundamentally flawed because:
  1. if there exists a single well written report it would deter others from reporting (for some unknown reason this is not an issue in this forum...), therefore resulting in a false negative
  2. new issues would need several weeks to jump the is-a-bug threshold in the search results
  3. a complicated to describe issue without a catchphrase as handle could easily result in a false negative
  4. phrases that are commonly used to describe general behaviour on bug occurrences like "I was walking", "when clicking", "catching up",.. could result in false positives
  5. the users could "cheat" the system by repeatedly posting things like "not enough kittens", "spider pig not to be found", "I spawned on an island",... resulting in false positives
Greetings Loewchen

User avatar
Deadly-Bagel
Smart Inserter
Smart Inserter
Posts: 1498
Joined: Wed Jul 13, 2016 10:12 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Deadly-Bagel »

Smarty wrote:Quick trick you can do if someone is joining pause the game and keep it paused until the player has downloaded the map and did the catching up part it will be much faster because on large maps i have the same problem with my FX-8350
If I recall correctly that is what it used to do, no? Problem is when you're aiming for ~60 players on a single server, you only need to get a few people start to connect and the game is just locked up for a few minutes. Players will get impatient and leave, so there are more open slots for more players to join and lock up the game. Also what if you have a player join on a really slow computer? You would need to set a timeout, but what to? 2 minutes is a looong time to wait for someone to time out, but is 30 seconds long enough for everyone to connect?

It's not like you have to run faster than the server, just faster than 60UPS.
Money might be the root of all evil, but ignorance is the heart.

inetknght
Burner Inserter
Burner Inserter
Posts: 14
Joined: Fri Sep 09, 2016 5:06 am
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by inetknght »

Smarty wrote:Quick trick you can do if someone is joining pause the game and keep it paused until the player has downloaded the map and did the catching up part it will be much faster because on large maps i have the same problem with my FX-8350
That's an interesting idea and one we'd explored a little bit. It works (sort of). I think it's interesting that if you can't catch up (the progress bar is going backwards), then if you pause the game to be able to catch up, you'd eventually get dropped anyway, right? Not so. Pausing the game does allow the client to finish connecting. After connecting, however, we had some trouble with the client's input not doing anything; no movement, being unable to view inventory, or anything like that. I suspect that's a separate problem (ie, the two problems aren't related). In my opinion, the not-catching-up problem is the bigger problem.

But here's the problem pausing the game: what if you're the first person to connect to the server? Or rather, not the first person, but the only person at that time to try to connect? How do you pause the game? It's 3 AM, you can't sleep, you wanna get some Factorio in... and you can't connect. You don't want to wake up your buddy in the middle of his precious REM cycle. So you play singleplayer instead. Of course, singleplayer doesn't have the same map. Not unless we want to start a fun thing where we download the map from the server and take turns playing on it. Oh man, that sounds fun (sarcasm).

And, unfortunately, if you get into a desync loop (I've had a bunch of those, that's unfortunate...), you needlessly keep the game paused for everyone else. As is, you're only doing that while downloading the map.

posila
Factorio Staff
Factorio Staff
Posts: 5201
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by posila »

I am not sure why the this was dismissed so quickly, I think not being able to connect to game because you cannot simulate it faster than server is a problem (as long as you can simulate is as fast as server can). But what I am interested in your save. So I can try to run it and profile it.

Other problems: what if somebody with significantly slower computer connects to the game? What if you want to play with such person? What if you don't?

The game we want to make is one where Factory you build does its job even when you don't see it. We also want every item to be accounted for, no faking like "this chest gets average of 10 red circuits per minute so when you look at it after 20 minutes, it will have 200 more than the last time". That is why Factorio exists as a standlone game and not as mod in Minecraft (see IndustrialCraft and BuildCraft mods, which inspired Factorio). If that is not a game you want to play, that is perfectly understandable.

So basically we require Factorio to update whole factory all the time. That is not a bug nor oversight, that is intentional behavior. Can we make it so that only server needs to update whole game state? I don't know. Probably yes, but it seems like huge rewrite of ... everything, unless we just send draw commands to clients, which would require massive amount of bandwidth. And frankly, it is quite opposite to what majority of our players want. They would like to run the server on Raspberry Pi or on some crappy machine with Intel Atom.

As for multithreaded update. It is very likely it will come in not so distant future.

Btw: If I remember correctly, we do multiplayer exactly as Starcraft does. Everybody runs the simulation and exchange just input actions. But in Starcraft you can have maximum of 8 players on small finite map, each with 200 unit cap. That is one medium mining outpost in Factorio.

Sunder1977
Burner Inserter
Burner Inserter
Posts: 12
Joined: Tue Apr 12, 2016 2:04 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by Sunder1977 »

@OP: Your anecdotal evidence has been completely countered.

viewtopic.php?f=53&t=32649

Over 150 people connected, caught up, and played just fine. Obviously it's not a problem with the software.

kogimus
Manual Inserter
Manual Inserter
Posts: 4
Joined: Sat Sep 03, 2016 10:48 pm
Contact:

Re: [14.3] Unable to "Catch Up" with capable hardware

Post by kogimus »

On Tuesday 9/14/2016 I hosted an 'open game' multiplayer session and we ended up with 159 players! We will be resuming this map again on Thursday (at 20:00 EST) and more people are welcome to join. If you want to help *truly* stress test these systems, please feel free to pop into the game on Thursday. This is a purely vanilla game, run on the latest experimental beta build (0.14.5 for the first session).

You can see the first session here if you're interested in the chaos:

https://www.youtube.com/watch?v=LeHEj_sEKnw
Except.. it isn't "anecdotal".

1) We provided logging and system specifications for affected systems and computers, including the version of factorio being played

2) acknowledging that our situation is already an edge/corner case, because we've got like 60 hours of factory building and a 30+mb map

3) your test case (hundreds of people on an empty map starting fresh) is a different case than ours (developed map with a large number of entities, including a large number of biters spawned)

What your example proves is this :
Using factorio 14.5 with no mods and starting fresh, a server can sync 159 people successfully.
This, while awesome, isn't ... really relevant to what's being reported.
Also :
orzelek wrote:
It was also explained quite few times here on forums and in some FFF's.
How, exactly, if it's been explained and mentioned numerous times that this is a thing that happens, is this suddenly somehow "anecdotal"?
by posila » Wed Sep 14, 2016 9:47 am

I am not sure why the this was dismissed so quickly, I think not being able to connect to game because you cannot simulate it faster than server is a problem (as long as you can simulate is as fast as server can). But what I am interested in your save. So I can try to run it and profile it.
Certainly! Attached per your request.
Attachments
Mods-1202-1409-2016.zip
Current Mods
(39.24 MiB) Downloaded 184 times
_autosave1-1201-1409-2016.zip
Most recent _autosave zip file.
(35.81 MiB) Downloaded 205 times

Post Reply

Return to “General discussion”