Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routes/DNS entries missing after client updates #3224

Open
roberthase opened this issue Jan 22, 2025 · 7 comments
Open

Routes/DNS entries missing after client updates #3224

roberthase opened this issue Jan 22, 2025 · 7 comments

Comments

@roberthase
Copy link

roberthase commented Jan 22, 2025

Problem after updating windows clients to 0.35.2 and the following versions up to 0.36.3:

Clients can connect to the controller but routes and dns entries in the local regristry get deleted right after creation.

Restarting or disconnecting/connecting or updating/reinstalling the client do not help.

To resolve this issue, clients have to be deleted in the controller and readded.
Other reports from my colleagues indicate that deleting and readding the group of the peer in the controller can also help resolve this.

Self hosted controller version 0.36.3
Clients are added with setup-keys.

We have about 500 devices running netbird and so far 1-2% of all devices seems to be affected, when we roll out updates.

@lixmal
Copy link
Contributor

lixmal commented Jan 23, 2025

Hi @roberthase,

what does netbird status -dA look like on an affected device? The debug bundle would also be helpful, at least the network_map.json from the archive.

netbird debug for 1m -AS

@roberthase
Copy link
Author

Hi @lixmal

here is the netbird status -dA output. The ip-adress of the local interface seems off. Is netbird using the wrong interface/route for the wireguard tunnel?

After fixing it, it show the correct local-ip of our branch network.

C:\Users\Administrator>netbird status -dA
Peers detail:
svvi-netbird02.anon-MlAgN.domain:
NetBird IP: 100.103.156.130
Public key: EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=
Status: Connected
-- detail --
Connection type: P2P
ICE candidate (Local/Remote): host/prflx
ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/10.201.0.187:58976
Relay server address:
Last connection update: 2 minutes, 15 seconds ago
Last WireGuard handshake: -
Transfer status (received/sent) 0 B/3.9 KiB
Quantum resistance: false
Routes: 10.1.0.0/24, 10.10.31.201/32, 10.10.31.202/32, 10.201.0.0/24, 10.5.0.0/24
Networks: 10.1.0.0/24, 10.10.31.201/32, 10.10.31.202/32, 10.201.0.0/24, 10.5.0.0/24
Latency: 4.6836ms

svvi-netbird03.anon-MlAgN.domain:
NetBird IP: 100.103.193.163
Public key: G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=
Status: Connected
-- detail --
Connection type: Relayed
ICE candidate (Local/Remote): host/relay
ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/198.51.100.0:58030
Relay server address:
Last connection update: 2 minutes, 13 seconds ago
Last WireGuard handshake: 13 seconds ago
Transfer status (received/sent) 4.6 KiB/680 B
Quantum resistance: false
Routes: -
Networks: -
Latency: 6.8562ms

OS: windows/amd64
Daemon version: 0.36.3
CLI version: 0.36.3
Management: Connected to https://netbird.anon-c1NgS.domain:33073
Signal: Connected to http://netbird.anon-c1NgS.domain:10000
Relays:
[stun:netbird.anon-c1NgS.domain:3478] is Available
[turn:netbird.anon-c1NgS.domain:3478?transport=udp] is Available
Nameservers:
[10.201.0.10:53, 10.201.0.11:53] for [anon-MlAgN.domain, anon-ru7CX.domain, anon-h7sUo.domain, wiki.anon-c1NgS.domain, vault.anon-c1NgS.domain] is Unavailable, reason: read udp 100.103.240.253:63630->10.201.0.11:53: i/o timeout
FQDN: nb120-09.anon-MlAgN.domain
NetBird IP: 100.103.240.253/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Networks: -
Peers count: 2/2 Connected

the network-map is kinda large, so i excluded custom zones:

{
"Serial": "822",
"peerConfig": {
"address": "100.103.240.253/16",
"dns": "",
"sshConfig": {
"sshEnabled": false,
"sshPubKey": ""
},
"fqdn": "nb120-09.anon-Eq3hK.domain",
"RoutingPeerDnsResolutionEnabled": false
},
"remotePeers": [
{
"wgPubKey": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"allowedIps": [
"100.103.156.130/32"
],
"sshConfig": {
"sshEnabled": false,
"sshPubKey": "c3NoLXBsYWNlaG9sZGVyLWtleQ=="
},
"fqdn": "svvi-netbird02.anon-Eq3hK.domain"
},
{
"wgPubKey": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"allowedIps": [
"100.103.193.163/32"
],
"sshConfig": {
"sshEnabled": false,
"sshPubKey": "c3NoLXBsYWNlaG9sZGVyLWtleQ=="
},
"fqdn": "svvi-netbird03.anon-Eq3hK.domain"
}
],
"remotePeersIsEmpty": false,
"Routes": [
{
"ID": "cqc2meo9eivs73d80rig",
"Network": "10.1.0.0/24",
"NetworkType": "1",
"Peer": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only anon-7VJPJ.domain vlan1",
"Domains": [],
"keepRoute": false
},
{
"ID": "cq63bm09eivs73d80rg0",
"Network": "10.201.0.0/24",
"NetworkType": "1",
"Peer": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only verbund",
"Domains": [],
"keepRoute": false
},
{
"ID": "cqc30eg9eivs73d80rj0",
"Network": "10.5.0.0/24",
"NetworkType": "1",
"Peer": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only 10.5.0.0/24",
"Domains": [],
"keepRoute": false
},
{
"ID": "crumclo9eivs73fen59g",
"Network": "10.10.31.201/32",
"NetworkType": "1",
"Peer": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"Metric": "9999",
"Masquerade": true,
"NetID": "nashv.anon-Eq3hK.domain",
"Domains": [],
"keepRoute": false
},
{
"ID": "cu0f9m09eivs738amfcg:cp4daohd612c738e2oc0",
"Network": "10.10.31.202/32",
"NetworkType": "1",
"Peer": "EwQvkLLAbpizZkvLRYUR3b2Xl52l4HKXfIOb0Lz5jCA=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only nas-uk.anon-Eq3hK.domain",
"Domains": [],
"keepRoute": false
},
{
"ID": "cu0f9m09eivs738amfcg:cs3dp109eivs73ben250",
"Network": "10.10.31.202/32",
"NetworkType": "1",
"Peer": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only nas-uk.anon-Eq3hK.domain",
"Domains": [],
"keepRoute": false
},
{
"ID": "cs3dqq09eivs73ben26g",
"Network": "10.1.0.0/24",
"NetworkType": "1",
"Peer": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only anon-7VJPJ.domain vlan1",
"Domains": [],
"keepRoute": false
},
{
"ID": "cs3dr289eivs73ben27g",
"Network": "10.10.31.201/32",
"NetworkType": "1",
"Peer": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"Metric": "9999",
"Masquerade": true,
"NetID": "nashv.anon-Eq3hK.domain",
"Domains": [],
"keepRoute": false
},
{
"ID": "cs3dqj09eivs73ben260",
"Network": "10.201.0.0/24",
"NetworkType": "1",
"Peer": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only verbund",
"Domains": [],
"keepRoute": false
},
{
"ID": "cs3dqug9eivs73ben270",
"Network": "10.5.0.0/24",
"NetworkType": "1",
"Peer": "G6R0dIowqLql+rJ2+RUHcALi4kfpgoKYsOTO+RsmuyI=",
"Metric": "9999",
"Masquerade": true,
"NetID": "only 10.5.0.0/24",
"Domains": [],
"keepRoute": false
}
],
"DNSConfig": {
"ServiceEnable": true,
"NameServerGroups": [
{
"NameServers": [
{
"IP": "10.201.0.10",
"NSType": "1",
"Port": "53"
},
{
"IP": "10.201.0.11",
"NSType": "1",
"Port": "53"
}
],
"Primary": false,
"Domains": [
"anon-Eq3hK.domain",
"anon-7VJPJ.domain",
"anon-qX3Zb.domain",
"wiki.anon-txtgr.domain",
"vault.anon-txtgr.domain"
],
"SearchDomainsEnabled": true
}
],
"CustomZones": [

      }
    ]
  }
]

},
"offlinePeers": [],
"FirewallRules": [
{
"PeerIP": "100.103.156.130",
"Direction": "IN",
"Action": "ACCEPT",
"Protocol": "ALL",
"Port": ""
},
{
"PeerIP": "100.103.193.163",
"Direction": "IN",
"Action": "ACCEPT",
"Protocol": "ALL",
"Port": ""
},
{
"PeerIP": "100.103.156.130",
"Direction": "OUT",
"Action": "ACCEPT",
"Protocol": "ALL",
"Port": ""
},
{
"PeerIP": "100.103.193.163",
"Direction": "OUT",
"Action": "ACCEPT",
"Protocol": "ALL",
"Port": ""
}
],
"firewallRulesIsEmpty": false,
"routesFirewallRules": [],
"routesFirewallRulesIsEmpty": true
}

@roberthase
Copy link
Author

To follow up on the issue, which is still affecting multiple devices a day:

When running netbird status --detail on a broken device - ice candidate endpoints local is the same ip as my controller.
When running netbird status --detail on a working device - ice candidate endpoints local is the ip of the local lan/wlan interface.

Moving a peer out of the group and into the group fixes this issue immidiatly.

@roberthase
Copy link
Author

Seems to be the same as #3121

@lixmal
Copy link
Contributor

lixmal commented Feb 6, 2025

#3121 should be relay-only. From your description, it doesn't seem to be relayed in your case.

Could you provide the debug bundle so we can debug this further? The network map is fine

@roberthase
Copy link
Author

roberthase commented Feb 6, 2025

are you sure, even though the last wireguard handshake is not available in netbird status -dA output?

edit: specifically: the handshake is established with a routing peer b, while routes are added to routing peer a with no handshake.

@lixmal
Copy link
Contributor

lixmal commented Feb 6, 2025

It's most likely something else. The relay issue manifests after a longer time, not ~2 minutes.
Although of course you're free to test once we have located the relay related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants