Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP hole punch and forwarded UDP ports not working on headless (LNL/LiteNetLib) #3309

Closed
bontebok opened this issue Nov 4, 2021 · 7 comments
Labels
Not a Bug Not a bug but an issue

Comments

@bontebok
Copy link
Collaborator

bontebok commented Nov 4, 2021

Describe the bug?

Headless servers and clients are unable to establish a peer to peer connection when the headless is behind a NAT. The only peer to peer connection I have observed is when both the client and server are not behind a NAT.

According to the Neos networking documentation, a peer to peer connection should be possible by being on the same network or by means of establishing a UDP hole punch. However, clients do not establish a peer to peer connection to the headless server located behind a Type-2 (static port NAT). Additionally, clients to not attempt to connect to the host port of the server over the public IP address. Due to this, no client is able to successfully establish a peer to peer connection to the headless which results in all traffic going through a relay server.

By resolving this bug, Neos could save money by means of reduced bandwidth costs and Neos users may experience better performance and less desync issues by connecting directly.

Note: This issue may apply to clients hosting worlds as well, but only tests I've conducted thus far have been with a headless server as host.

Relevant issues

No relevant issue were found, however with the recent changes to the network stack it is possible that this issue is the root of some desync, disconnect and other issues that users have reported. However, I do not have any packet captures or evidence that this has ever worked in prior builds.

To Reproduce

  1. Set up a headless server on a LAN with a modern Internet router.
  2. Ensure the router is set up to perform static port NAT or sometimes referred to as Type-2 NAT where the UDP port numbers are preserved for outbound packets.
  3. Additionally, configure a UDP port forward on the world ports (either default or configured forcePort settings)
  4. Have any client behind a NAT attempt to connect.

You will observe that no attempts are made by the client to establish a connection to the public IP address of the server over the outbound port or the inbound forwarded ports.

Expected behavior

Expected behavior is that the UDP hole punch is successful OR the connection is established via the forwarded server ports.

Log Files

None of the headless logs appear to contain relevant data pertaining to the LiteNetLib. If there is a debug command I can apply to the server, I'd be happy to test again and save any relevant logs.

Screenshots

capture

The above screenshot shows the entire LiteNetLib connection process. Note the IP address have been redacted for security purposes.

Key -
.6 = Private IP address of the server
[Blank] = Private IP address of the client
.116 = Public IP address of the client
.220 = LNL Relay

I'm unfamiliar with the full process that LiteNetLib uses to perform the UDP hole punch, but packets do not show evidence of any attempt to utilize the static port NAT of the server or the client establishing a direct connection to the server via the forwarded IP.

How often does it happen?

Always

Does the bug persist after restarting Neos?

Yes

Neos Version Number

2021.10.30.605

What Platforms does this occur on?

Linux

Link to Reproduction Item/World

No response

Did this work before?

I Don't Know

If it worked before, on which build?

No response

Additional context

I'd be happy to test further or provide a copy of the packet captures to the devs upon request.

Reporters

Rucio#0134

@bontebok bontebok added the Bug Something isn't working label Nov 4, 2021
@Frooxius
Copy link
Collaborator

Frooxius commented Nov 4, 2021

How many combinations of connections have you tried with this?

UDP Punchthrough isn't something that's guaranteed to work in 100 % of the time, because it heavily depends on the network route between the two ends. This means that that not only the host connection matters, but the connecting user as well and potentially other network elements (e.g. if you're behind multiple levels of NAT).

I've seen UDP Punchthrough work on everyday basis, I've even tested it with some headless servers right now and it works, so I don't think this is something that's completely broken.

There shouldn't really be any difference between headless and client anyways, because they're using the exact same code.

@Frooxius Frooxius added the Needs More Information Further information is requested label Nov 4, 2021
@iamgreaser
Copy link

iamgreaser commented Nov 4, 2021

From experiments done last... well, moment:

A UDP server behind certain types of NAT can result in the hole punch for that port timing out if someone stops sending traffic. This is something that's happening to me during tests w/ Rucio, and we both have the "good" NAT (port numbers map directly and it doesn't screw people around)

Although this doesn't explain why an explicitly-forwarded port would break. That'll need further investigation.

What does LNL do to alleviate this situation?

@Frooxius
Copy link
Collaborator

Frooxius commented Nov 4, 2021

That sounds like there's some firewall or something that might be blocking it. Neos will send regular beacon packets to keep the connection ready for punchthrough.

Can you host a headless session with a custom session ID and let me know what it is? I can check what's happening on the server-side if I know what to look for.

It might be just UDP not working with your particular combination of connections. Unfortunately that happens, networks are very messy. I'd probably check firewalls and make sure your ISP isn't filtering things either.

@iamgreaser
Copy link

Neos will send regular beacon packets to keep the connection ready for punchthrough.

OK, that's good to know.

A little lynx told me that there's a reply to this thread coming soon...

@bontebok
Copy link
Collaborator Author

bontebok commented Nov 4, 2021

Thanks for the response Froox. I did more testing and determined that the static port NAT on the server was not configured appropriately (Thanks to iamgreaser for running some UDP tests with me). It seems as though the UDP hole punch is only successful if either the host/server or both parties have static port NAT.

After finally achieving LNL Relay-free connections, all of the connection problems I've been having vanished. The packet captures for when the LNL Relay was showing a constant loss of packets at a regular cadence. I'm glad to have found a configuration that allows me to not have to use the relay, but I worry that there's something causing trouble at the relay.

I think it's best to archive this issue, but I do think there's some details that would be helpful for others to know, including customers/users operating behind a random port (Type-3 NAT) and that port forwards do not appear to be used by Neos. I'm going to keep doing some testing, maybe there's things I can help to add to the documentation about properly configuring enterprise routers and routers like pfSense and OPNsense which by default will randomize the source port unless otherwise configured.

Regarding the relay, have you heard any reports of users experiencing packet drops/connection issues when utilizing it? I'd be happy to flip my config back and do some more testing or grab some packets if you'd like.

PS: The headless is RucioLess and it's running an open world named "LNL Testing". Session ID is S-5ac3d8aa-e4a8-487d-8cf4-cfb19534faae

@Frooxius Frooxius added Not a Bug Not a bug but an issue and removed Bug Something isn't working Needs More Information Further information is requested labels Nov 4, 2021
@Frooxius
Copy link
Collaborator

Frooxius commented Nov 4, 2021

Thanks for the info, I'm glad you got it to work!

And yeah, unfortunately if even one of the sides randomizes the ports, the punchthrough isn't likely going to work. It requires that the port stays the same between the two requests.

The relay is there specifically for cases like this, when due to network conditions/configuration the process just fails.

We currently have only a single Relay in Central US, so it's likely that you might be tacking on a significant amount of latency, which is going to make this behave worse. We'll probably add more as time goes, but it's an additional cost on our end.

I'm going to close this for now, thanks! If you'd like to share this info with others, I'd recommend making article on our Wiki!

@Frooxius Frooxius closed this as completed Nov 4, 2021
@bontebok
Copy link
Collaborator Author

bontebok commented Nov 4, 2021

As a follow up, I expanded the existing article on the Neos Wiki to help users configure their routers appropriately for peer to peer communication - https://wiki.neos.com/Networking_Information
I also submitted a feature request for port forwarding and uPNP/NAT-PMP support - #3312

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Not a Bug Not a bug but an issue
Projects
None yet
Development

No branches or pull requests

3 participants