Page 1 of 1

[raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Mon Apr 28, 2025 1:43 pm
by Evio
I found that if I have more than exactly 620 distinct mods present in the mods directory, the game won't connect at all to remote servers with mods.

I have reproduced this with 2.0.43 and 2.0.45, both with Space Age, and remote servers are headless.

UDP traffic still goes out and the server replies but it looks like the game drops the communication when this happens.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods

Posted: Mon Apr 28, 2025 2:13 pm
by Rseding91
Could you zip and upload your mods directory somewhere so that I could attempt to reproduce this? I do not know of any such game limitation around mods-on-disk and don't look forward to trying to click download on 620 mods on the mod portal.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods

Posted: Mon Apr 28, 2025 3:04 pm
by eugenekay
From previous work reverse-engineering the Multiplayer protocol I remember that the Mods/Version portion of the handshake had what appeared to be a "Length" field to indicate how many frames were to follow? I was not sure if that was part of the UDP system headers or was added by Factorio itself, but changing it caused the game to Reject the Client with a Mod error. It may simply be that 620 Mods (640 is 80 bytes; minus a bit of padding) is exceeding the max length of a Struct in the network code somewhere.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods

Posted: Mon Apr 28, 2025 3:10 pm
by Rseding91
The mods count length field supports up to 4'294'967'295 mods so I doubt that's the issue.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Mon Apr 28, 2025 3:55 pm
by Evio
I confirm that this happens when running Factorio inside a container (podman) and mods/ is an overlay mount, this happens regardless of the mods being present in the upper or lower layer of the overlay.

I have 7.0G of mods and a mobile internet connection so It's not possible for me to share the mods, I can share a script that I drafted that downloads all mods from a plain text list but that's still overkill, only the presence of 620+ mods with a valid info.json inside is enough to trigger the bug, so generating fake mods with a script works:

Code: Select all

#!/bin/sh
mkdir empty

for num in {0..620}; do
	echo '{"name": "empty'"$num"'", "version": "0.1.0", "title": "Empty mod '"$num"'", "author": "nobody"}' >empty/info.json

	zip -r "empty$num"_0.1.0.zip empty
done

rm    empty/info.json
rmdir empty
Place that as populate.sh (or run directly) inside the lower or upper dir of the mount and run to create 621 empty mods.

Other files don't affect, only zip files that were parsed by Factorio do, and they must be distinct 620+ mods, older versions of the same mods don't have effect on this.

The overlay mount is:

Code: Select all

-v ./common/mods/:/mnt/server/test/mods:O,upperdir="server/test/mods",workdir="server/test/mods.workdir"
I just checked ulimit to make sure that it's not a ulimit issue but all limits appear as unlimited both inside and outside the container.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Mon Apr 28, 2025 5:18 pm
by Rseding91
So, I'm doing the following (on windows):

* I have 632 (zipped) mods in the mods folder - none of them are active.
* The in-game mods manager shows all of them as valid but inactive
* Join game by browse-LAN games -> join -> works fine
* Join game by connect to IP -> works fine
* Join game by browse public games -> join a game with some amount of mods -> after syncing and installing the mods -> works fine

My conclusion is you're hitting the ulimit for opened file handles even though we explicitly asks the OS via:

Code: Select all

setrlimit
to increase it to 12544.

Re: [2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Mon Apr 28, 2025 5:23 pm
by Rseding91
Can you post a log file from a failed multiplayer attempt?

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 3:51 pm
by Evio
Here are the two log files, I removed the leading line timestamp for easy diff.

In the “work” log the server immediately responded with mod mismatch because that run had no mods.
The “fail” log is a run with 621 fake mods present, this one ended with connection timeout.

During the failed run the server still responded though, I could see with iptraf how UDP packets went to the server and the server responded, around 3 times per second, until the timeout. Mod downloading and updating still works alright, it's only game connectivity the one that's affected.

I can make a minimal script to reproduce this in a container if that helps.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 5:05 pm
by robot256
Evio wrote: Tue Apr 29, 2025 3:51 pm Here are the two log files, I removed the leading line timestamp for easy diff.

In the “work” log the server immediately responded with mod mismatch because that run had no mods.
The “fail” log is a run with 621 fake mods present, this one ended with connection timeout.
(Maybe post the unaltered log too, because the timestamp will show the delay between each line. And logs from both server and client if you have them.)

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 5:19 pm
by Rseding91
Raiguard looked into this and it seems it is due to file handle limits on Linux, except in this case it’s due to us using the select() function for network sockets and it failing when having > 1024 handles open (files, sockets, other, I guess). Switching it to poll() resolves the issue.

Windows doesn’t seem to share the same issue/its limits on handles are larger.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 7:32 pm
by Evio
Was this problem actually reproduced? This is an issue with running inside OCI containers, such as the ones made with podman/docker, this problem doesn't happen if I run Factorio as a normal program in Linux, this only happens inside containers.

As for the other logs and timestamps: I don't have those logs now but there were no unusual delay between actions and the server didn't acknowledge any connection attempts in the logs.

From: https://www.man7.org/linux/man-pages/man2/select.2.html
WARNING: select() can monitor only file descriptors numbers that
are less than FD_SETSIZE (1024)—an unreasonably low limit for many
modern applications—and this limitation will not change. All
modern applications should instead use poll(2) or epoll(7), which
do not suffer this limitation.
Makes sense, although TCP connections still work when the issue is triggered.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 7:46 pm
by Rseding91
I was told it was reproduced.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Tue Apr 29, 2025 7:56 pm
by Evio
If the change makes it to a new Factorio version I'll try to replicate to confirm then. Thanks for the help.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Wed Apr 30, 2025 8:59 pm
by raiguard
Evio wrote: Tue Apr 29, 2025 7:32 pm Was this problem actually reproduced?
Yes, I reproduced it on my machine (Fedora Linux, i9-10900k), but I had to up the number of mods to around 650. LAN games would not show up whatsoever and connecting to public games would fail (even though I could see them). Changing all usages of select() to poll() resolved the issue.

Re: [raiguard][2.0.45] No multiplayer connectivity after 620 distinct mods (inside podman container)

Posted: Thu May 01, 2025 4:34 pm
by raiguard
Thanks for the report, this has been fixed for 2.0.48.