Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery metrics #21

Open
Tracked by #221
fryorcraken opened this issue Jun 20, 2024 · 8 comments
Open
Tracked by #221

Discovery metrics #21

fryorcraken opened this issue Jun 20, 2024 · 8 comments
Assignees

Comments

@fryorcraken
Copy link

  1. Report number of nodes found, per discovery strategy (if possible)
  2. Report number of nodes successfully connected to
  3. Report number of nodes that were freshly discovered but connection failed
  4. Report whether own node is marked as discoverable. It seems know own tcp port is the best bet
@chaitanyaprem
Copy link

Sounds good, we need to make sure that point-1 reports number of unique nodes found..not same ones (which has been noticed with current discv5).

Wrt point-4: we can report this based on response from AutoNAT which is not exposed to status-go as of now. This is something that can be included in go-waku as an API or maybe we can access the libp2p API directly. @richard-ramos has better idea about this.

@richard-ramos
Copy link
Member

Report whether own node is marked as discoverable. It seems know own tcp port is the best bet

There are different sources to know if your node is discoverable or not:

  1. Subscribing to reachability changes from go-libp2p: a node can be Private or Public, with private nodes still being accessible thru circuit relay. Public nodes do have external IP address.
  2. Having a circuit relay address in the ENR.

It's also worht taking into account that a node can change their reachability during runtime (like for example if you switch networks or if the circuit relay node goes offline).

@fryorcraken
Copy link
Author

I think it would be good to also add:

  1. whether a node was able to get an external port and ip via nat negotiation
  2. whether holepunching worked via circuit relay

@adklempner adklempner self-assigned this Jun 24, 2024
@adklempner
Copy link
Collaborator

An interesting idea brought up by @danisharora099 is that if a node has a bunch of discovered peers but cannot establish any connections, the telemetry service can also try connecting to those peers to determine if they are truly inaccessible, or if the node reporting the metrics is misconfigured, or if the peers are just not reachable from the node's environment

@chaitanyaprem
Copy link

An interesting idea brought up by @danisharora099 is that if a node has a bunch of discovered peers but cannot establish any connections, the telemetry service can also try connecting to those peers to determine if they are truly inaccessible, or if the node reporting the metrics is misconfigured, or if the peers are just not reachable from the node's environment

Maybe telemetry service doesn't need to do this...rather just collect data from various nodes that report connectivity status of a peer and record that information. i.e how many nodes were able to connect to it and how many failed.
It could be possible that peer has reached connection limit and hence disconnecting connections or some other reason.
We can probably deduce such info by gathering from other nodes rather than telemetry service doing this itself.

@chaitanyaprem
Copy link

An interesting idea brought up by @danisharora099 is that if a node has a bunch of discovered peers but cannot establish any connections, the telemetry service can also try connecting to those peers to determine if they are truly inaccessible, or if the node reporting the metrics is misconfigured, or if the peers are just not reachable from the node's environment

Maybe telemetry service doesn't need to do this...rather just collect data from various nodes that report connectivity status of a peer and record that information. i.e how many nodes were able to connect to it and how many failed. It could be possible that peer has reached connection limit and hence disconnecting connections or some other reason. We can probably deduce such info by gathering from other nodes rather than telemetry service doing this itself.
It can be a case that ip-colocation-limit is reached due to which the node is rejecting/dropping connections.

We have many safety checks like this to prevent from a node getting targetted

@adklempner
Copy link
Collaborator

adklempner commented Jul 12, 2024

Track ratio of discovered nodes to connected nodes

@adklempner
Copy link
Collaborator

adklempner commented Jul 17, 2024

When a connection fails, record what kind of connection it was. i.e. did a desktop node get a port from the router and advertise out of the box, or did it use hole-punching via circuit relay.

Metrics should help answer the following questions:

Does holepunching in general works?

If less than 50% (or maybe 80%) of holepunching succeed, then indeed, it may be worth to simply fully discard this option?

Another question is: how many connections to filter nodes does the mobile app try to maintain? how many attempts are done etc?

If we target 20 connections, and 10 fails, then we still have the other half which should be enough to get subscriptions on 2 filter nodes and have some backup too.

In short, what is the connection of a failed connection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants