Page 1 of 1

[Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Sat Apr 03, 2021 8:23 pm
by Hornwitser
When the ConnectionAcceptOrDeny message from the server is dropped on the way to the client it's possible for the server to end up deadlocked waiting for the client to continue the joining process. See the annotated image of the packet capture I've attached.

The server starts sending SynchronizerActions to the client immediately after it has sent the ConnectionAcceptOrDeny message, but if the client doesn't receive the ConnectionAcceptOrDeny message it waits half a second and then resends the ConnectioRequestReplyConfirm message it sent to the server.

In response to the second ConnectionRequestReplyConfirm message the server sends a modified ConnectionAcceptOrDeny message back with a higher first sequence number, if the client continues from this sequence then it will miss all the synchronizer actions sent by the server between the first and second ConnectionAcceptOrDeny message.

If the client is waiting for one of those synchronizer actions then the client and server deadlocks, the client is waiting for the server to send a message it has already sent, and the server is waiting for the client to signal it's finished downloading the auxiliary data. For players connected to the server this shows up as a progress bar at 100% saying "Server (<server>) is saving the map for player.", for the player connecting it's a progress bar at 100% saying "Waiting for the server to save the map.". The server remains frozen until the player connecting aborts the connection.

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Sat Dec 18, 2021 4:53 pm
by oof2win2
Please fix this. It still occurs sometimes, and it forces the server to be restarted.

Here is a bit of code to replicate the issue. Doesn't support passworded servers (yet), but replicates the issue successfully. Existing data (such as serverKey and serverKeyTimestamp) was taken from running a Factorio instance and connecting it, with Wireshark logging the packets. The fake client will be dropped after some time as it doesn't change it's nextToRecieveServerTickClosure value, but for that time the server will be stuck saving.
https://github.com/oof2win2/factorio-connect-client

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Mon May 30, 2022 1:56 pm
by Rseding91
After looking into this (again) I think I found at least a solution that prevents the deadlock and it will be in the next 1.1. release.

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Mon Jun 06, 2022 7:14 pm
by oof2win2
The issue still persists in 1.1.60, which is the latest 1.1. release. See the github repo provided above for a working implementation that deadlocks the server 100% of the time. You only need to change the instance and request IDs - I do it by Wireshark and then I increment the request ID by an amount, which usually works.

Image

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Mon Jun 06, 2022 11:58 pm
by Rseding91
oof2win2 wrote: Mon Jun 06, 2022 7:14 pm The issue still persists in 1.1.60, which is the latest 1.1. release. See the github repo provided above for a working implementation that deadlocks the server 100% of the time. You only need to change the instance and request IDs - I do it by Wireshark and then I increment the request ID by an amount, which usually works.
That's packet manipulation not packet loss. Feel free to make a different report about packet manipulation.

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Thu Jun 09, 2022 1:19 pm
by Windsinger
Respectfully, you just ignored a perfectly worded PoC that allow malicious users to take down a server running with your software and you tell them to make a new bug report out of it. Are you serious?

Re: [Rseding] [1.1.30] Deadlock on loss of ConnectionAcceptOrDeny message

Posted: Thu Jun 09, 2022 6:20 pm
by Hornwitser
Windsinger wrote: Thu Jun 09, 2022 1:19 pm Respectfully, you just ignored a perfectly worded PoC that allow malicious users to take down a server running with your software and you tell them to make a new bug report out of it. Are you serious?
To be fair the issue this report is about is that the server sends an incorrect sequence number when the client retries a connection request and as a result of that the client holds up the server. This issue is not about the ability of a client to hold up the server during saving, that is technically a different issue concerning a different part of the network code. Which by the way has now been fixed for the next release.