Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response from nomad server fails #24348

Open
a-bangk opened this issue Nov 1, 2024 · 3 comments
Open

Response from nomad server fails #24348

a-bangk opened this issue Nov 1, 2024 · 3 comments

Comments

@a-bangk
Copy link

a-bangk commented Nov 1, 2024

Nomad version

Nomad v1.9.0
BuildDate 2024-10-10T07:13:43Z
Revision 7ad3685

Operating system and Environment details

Host : Windows Server 2019 Datacenter
Client : Windows Server 2019 Standard

Issue

Client force closes connection from host on one of our environments.

Reproduction steps

Client.conf file

data_dir = "C:/nomad/data"
 
bind_addr = "0.0.0.0"
 
datacenter = "example_client"
 
client {
	enabled = true
	servers = ["our-nomad-server:4647"]
	gc_disk_usage_threshold = 95
	artifact {
		decompression_file_count_limit = 0
	}
}
 
server {
	enabled = false
}
 
plugin "raw_exec" {
	config {
		enabled = true
	}
}

Start nomad on client

Expected Result

Client shows up in Nomad host clients list.

Actual Result

Connection is shutdown.

Nomad Server logs (if appropriate)

2024-11-01T15:12:30.471+0100 [ERROR] nomad.rpc: failed to read first RPC byte: error="read tcp LOCAL_HOST_IP:4647->EXTERNAL_CLIENT_IP:43594: wsarecv: An existing connection was forcibly closed by the remote host."
2024-11-01T15:12:48.867+0100 [ERROR] nomad.rpc: multiplex_v2 conn accept failed: error="read tcp LOCAL_HOST_IP:4647->EXTERNAL_CLIENT_IP:31314: wsarecv: An existing connection was forcibly closed by the remote host."

Nomad Client logs (if appropriate)

    2024-11-01T15:20:48.660+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.Register server=EXTERNAL_HOST_IP:4647
    2024-11-01T15:20:48.661+0100 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: EOF" rpc=Node.Register server=EXTERNAL_HOST_IP:4647
    2024-11-01T15:20:48.662+0100 [ERROR] client: error registering: error="rpc error: EOF"
@tgross
Copy link
Member

tgross commented Nov 8, 2024

Hi @a-bangk! The output you're seeing there is what I'd expect to see in the event that network connectivity was lost when the client has made an initial connection and the server is trying to figure out what kind of connection it is (TLS vs non-TLS, and Raft vs RPC). The client should retry after 15s. You may need to take a look at your network environment or TLS configuration.

@tgross tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Nov 8, 2024
@tgross tgross self-assigned this Nov 8, 2024
@a-bangk
Copy link
Author

a-bangk commented Nov 10, 2024

Hi @tgross thanks for looking at it. You’re correct the client keeps trying on different ports. We have nomad connecting on 3 out of 4 customers (windows server) but have been stumped with what makes the one different that prevents the connection. Windows firewall allows it through, now I’ll dig into TLS config. Further pointers to isolate the block would be greatly appreciated.

@tgross
Copy link
Member

tgross commented Nov 11, 2024

@a-bangk having only one node fail but fail reliably sounds like a reachability issue for that node. But I'm not much of a Windows networking administrator, so I don't have much advice for you on that front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants