Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow public API usage for GetValidators #77

Merged
merged 15 commits into from
Nov 7, 2023

Conversation

michaelkaplan13
Copy link
Collaborator

Why this should be merged

These changes allow for the use of the public API endpoints (i.e. https://api.avax.network) as the p-chain-api-url configuration value of a relayer deployment. This should hopefully make standing up relayer instances easier because it no longer requires running a node that provides the full suite if AvalancheGo APIs.

This PR also includes clean up of a mis-capitalized variable name, and well as an unnecessary call to initialize an unsigned warp message instance.

How this works

The public API endpoints do not support GetValidatorsAt requests because looking up validator sets at old block heights is a potentially expensive operation. For the purposes of the relayer, it only ever needs the current validator set of a subnet, which can be queried using GetCurrentValidators, which is supported by the public API endpoints. GetCurrentValidators however only includes the BLS public key of primary network validators. In order to still provide the subnet validator BLS public keys, two RPC requests are made: the first looking up the node IDs of the subnet validators for a given subnet, and then second looking up the BLS public keys associated with the primary network validation periods for those node IDs.

How this was tested

Running a relayer instance locally using the public API endpoints at the p-chain-api-url value.

How is this documented

Comments

Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions, primarily regarding how to have the workaround as an option rather than the only behavior.

relayer/canonical_validator_client.go Outdated Show resolved Hide resolved

// Set the BLS keys of the result.
for _, primaryVdr := range primaryVdrs {
vdr, ok := res[primaryVdr.NodeID]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ever expecting primaryVdr to not be found in res? My understanding is that the above call to GetCurrentValidators applies this filter already.

Copy link
Collaborator Author

@michaelkaplan13 michaelkaplan13 Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I don't think we ever expect it currently. I'll add a warning log for this case. I think it still makes sense to continue rather than returning an error, because the relayer should be able to handle this case the same as if the given validator didn't register a BLS public key on the P-chain. As long as a sufficient % of the subnet stake weight has BLS keys registered and set here, the message should be able to be successfully relayed still.

Update: Realizing now this is separate from validators that don't have BLS keys. It's if the RPC doesn't apply the proper filter. That being said, I'll still add a warning log and still thinking continue is the best option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree.

Can we add a check below that primaryVdr.Signer != nil? And if it is, we can emit a warning and continue like we do here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think that proper check is already there now actually. I chose not to emit any log in that case because it's expected (or at least certainly possible) for the primary network validator to not have any BLS public key registered.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you feel strongly otherwise though. I'm neither here or there on the log 👍

Copy link
Collaborator

@cam-schultz cam-schultz Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but it looks like the first time we access primaryVdr.Signer it's unchecked. If it's nil, that would cause the app to panic.

EDIT: was looking at a stale commit

func (v *CanonicalValidatorClient) GetValidatorSet(
ctx context.Context,
height uint64,
subnetID ids.ID,
) (map[ids.NodeID]*validators.GetValidatorOutput, error) {
return v.client.GetValidatorsAt(ctx, subnetID, height)
// Get the current subnet validators. These validators are not expected to include
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than fully replacing the call to GetValidatorsAt, what are your thoughts on adding this workaround as either a configurable switch, or a fallback option in the case that GetValidatorsAt fails?

For a switch, we could hardcode the public API endpoint and compare the config value against that. However, there's no guarantee that GetValidatorsAt is supported on some other node.

We could always try calling GetValidatorsAt then fallback to this workaround on failure, but that's strictly worse in terms of network round trips if we're using the public API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a config option indicating which API methods are supported by the configured node and switch on that. Alternatively, we could handle that flag dynamically by calling this method on startup and looking for the "method not supported" error message, though that would increase the set of things that could break unexpectedly.

I'm spitballing at this point, curious to hear your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For context, the reason I'm in favor of having some way to still call GetValidatorsAt is to have a path that minimizes latency for the relayer. If a project has a large enough number of validators, then they might want to run their own P-Chain API node to get that minimal latency path. That being said, I don't actually know if a call to GetValidatorsAt at the current height is necessarily faster than the workaround here, but I would suspect it is)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good thoughts...there are definitely trade offs one way or another at certain scales.

My main considerations are:

  • The relative frequency of relayer that don't want to set up their own nodes to use an API, and their performance requirements.
  • The relative frequency of relayer nodes that do want to set up their own nodes to use an API, and their performance requirements.
  • Code and configuration complexity.

I think this is a minute enough of a detail that adding configuration options for it is not worth the additional code complexity and mental bandwidth of those setting up nodes to try to properly understand what the proper value for their set up would be. Checking whether or not the API method is available at start up is a cool idea to get around this, but there's no guarantee than an API won't change it behavior at any point in time, so would be prone to breaking.

That makes me think either always using the workaround method, or always trying to first use GetValidatorsAt and then fallback to the workaround if needed is likely the best approach. For relayer deployments without access to their own API nodes, my inclination is that an additional request would not be a critical performance decrease. I would also guess that deployments that do have or set up their own API nodes, tend to have strictly performance expectations/requirements. Given that, I think using this option as a fallback in the even that GetValidatorsAt fails is likely the best route forward.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. Very good point that a relayer operator concerned with latency would likely roll their own API node; if they choose not to, then one extra round trip is likely not going to make a difference to them.

// only associated with primary network validation periods. If ACP-13 is implementated in the future
// (https://github.com/avalanche-foundation/ACPs/blob/main/ACPs/13-subnet-only-validators.md), it may
// become possible to reduce this to a single RPC request that returns both the subnet validators
// as well as their BLS public keys.
func (v *CanonicalValidatorClient) GetValidatorSet(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add to the top of this file the interface implementation check for ValidatorState?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

// Set the BLS keys of the result.
for _, primaryVdr := range primaryVdrs {
vdr, ok := res[primaryVdr.NodeID]
if !ok {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add documentation that if we have a validator taht we don't find it's public key in primary validator set we still include it for mapping, so it'd affect total stake weight, and need to check whether it has a public key

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll add it below. Worth noting that this case here is actually just if the "GetCurrentValidators" returns an unexpected node ID that wasn't included in the filter in the request body. I'll document that also.

@minghinmatthewlam
Copy link

outside of PR changes, but looking through the code brought me to https://github.com/ava-labs/awm-relayer/blob/main/relayer/message_relayer.go#L146. Is the comment about "not unique validators as represented by a BLS pubkey" outdated? Looks to me that avalanchego's GetCanonicalValidatorSet will filter based on unique public key bytes.

@minghinmatthewlam
Copy link

outside of PR changes, but looking through the code brought me to https://github.com/ava-labs/awm-relayer/blob/main/relayer/message_relayer.go#L146. Is the comment about "not unique validators as represented by a BLS pubkey" outdated? Looks to me that avalanchego's GetCanonicalValidatorSet will filter based on unique public key bytes.

nvm think that comment still applies, since we go through the nodes of each validator set. Might want to pick up this TODO at some point though, outside of PR scope.

res, err := v.client.GetValidatorsAt(ctx, subnetID, height)
if err != nil {
v.logger.Debug(
"P-chain RPC to getValidatorAt returned error. Falling back to getCurrentValidators",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we include the error in log?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added

@@ -34,10 +39,82 @@ func (v *CanonicalValidatorClient) GetSubnetID(ctx context.Context, chainID ids.
return v.client.ValidatedBy(ctx, chainID)
}

// Gets the current validator set of the given subnet ID, including the validators' BLS public keys.
// This implementation of GetValidatorSet currently makes two RPC requests, one to get the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be updated to getCurrentValidatorSet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated the comment to remove the name entirely since getCurrentValidatorSet isn't a method in the interface being implemented.

Copy link

@minghinmatthewlam minghinmatthewlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some nit questions. Also, is it a concern that the canonical_validator_client implements GetValidatorsAt(ctx, subnetID, height) by always querying with current height? I think we only use it when getting signatures so always use latest block height, but seems like we're changing the semantics of the method.

@michaelkaplan13
Copy link
Collaborator Author

LGTM, left some nit questions. Also, is it a concern that the canonical_validator_client implements GetValidatorsAt(ctx, subnetID, height) by always querying with current height? I think we only use it when getting signatures so always use latest block height, but seems like we're changing the semantics of the method.

Yeah, GetValidatorSet is used by the relayer to know which nodes it should query for signatures for a message it is attempting to relay, so using the current P-chain block height is the desired behavior in this case. I added an additional comment on GetValidatorSet noting that the implementation includes a fallback that ignores the P-chain block height. 👍

Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments 👍

)

var _ validators.State = (*CanonicalValidatorClient)(nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the syntax &CanonicalValidatorClient{} on the rhs is a bit cleaner, and it's also what we've used in other places in our repos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. There was one other place in this repo that we were using the (*Type)(nil) syntax, so I updated it there as well 👍

}

func NewCanonicalValidatorClient(client platformvm.Client) *CanonicalValidatorClient {
func NewCanonicalValidatorClient(logger logging.Logger, client platformvm.Client) *CanonicalValidatorClient {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the logger isn't being set in the object instantiation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, my bad. Thanks for catching this.

Copy link
Collaborator

@cam-schultz cam-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelkaplan13
Copy link
Collaborator Author

I just ran these changes against the testnet subnets to confirm they're working as expected:

{"level":"debug","timestamp":"2023-11-07T16:53:09.324Z","logger":"awm-relayer","caller":"relayer/canonical_validator_client.go:117","msg":"P-chain RPC to getValidatorAt returned error. Falling back to getCurrentValidators","subnetID":"1M439meCk9SDQYJSrDBJJeRntk1FtmSaaZakSeLFvcJonxGUM","pChainHeight":125467,"error":"failed to decode client response: the method platform.getValidatorsAt is not available"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:205","msg":"Relayer collecting signatures from peers.","attempt":1,"destinationChainID":"2Pvk9oh8fxUizgUT3zvBJzxkNw1KiXUEHCi9jnu2ep9PTxUZka","validatorSetSize":4,"signatureMapSize":0,"responsesExpected":4}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:224","msg":"Added node ID to query.","nodeID":"NodeID-EHV8d1ezFyoA8DMzTU3WZXfVR95R7v1cv"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:224","msg":"Added node ID to query.","nodeID":"NodeID-WqHQbG5MYQAYBRzjJV3qqy263mbz6iko"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:224","msg":"Added node ID to query.","nodeID":"NodeID-N1dQ3zWTgcpnxDqejNYLevSQQPGNZSgvg"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:224","msg":"Added node ID to query.","nodeID":"NodeID-GEC1zVazREQmLUpj1fF2qvuATpRvMgf62"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:241","msg":"Sent signature request to network","messageID":"g4BeQDFmKyHNYDzrpFDkGWyHZwuUEZxuPuEpSEioZLVRkRj2d","sentTo":{"NodeID-EHV8d1ezFyoA8DMzTU3WZXfVR95R7v1cv":{},"NodeID-GEC1zVazREQmLUpj1fF2qvuATpRvMgf62":{},"NodeID-N1dQ3zWTgcpnxDqejNYLevSQQPGNZSgvg":{},"NodeID-WqHQbG5MYQAYBRzjJV3qqy263mbz6iko":{}}}
{"level":"debug","timestamp":"2023-11-07T16:53:09.373Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:262","msg":"Processing response from node","nodeID":"NodeID-N1dQ3zWTgcpnxDqejNYLevSQQPGNZSgvg"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.374Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:282","msg":"Skipping irrelevant app response"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.451Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"app_response"}
{"level":"info","timestamp":"2023-11-07T16:53:09.451Z","logger":"awm-relayer","caller":"peers/external_handler.go:82","msg":"handling app response","from":"NodeID-EHV8d1ezFyoA8DMzTU3WZXfVR95R7v1cv"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.451Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:262","msg":"Processing response from node","nodeID":"NodeID-EHV8d1ezFyoA8DMzTU3WZXfVR95R7v1cv"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.452Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:299","msg":"Got valid signature response","nodeID":"NodeID-EHV8d1ezFyoA8DMzTU3WZXfVR95R7v1cv"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.453Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"app_response"}
{"level":"info","timestamp":"2023-11-07T16:53:09.453Z","logger":"awm-relayer","caller":"peers/external_handler.go:82","msg":"handling app response","from":"NodeID-GEC1zVazREQmLUpj1fF2qvuATpRvMgf62"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.453Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:262","msg":"Processing response from node","nodeID":"NodeID-GEC1zVazREQmLUpj1fF2qvuATpRvMgf62"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.454Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"app_response"}
{"level":"info","timestamp":"2023-11-07T16:53:09.454Z","logger":"awm-relayer","caller":"peers/external_handler.go:82","msg":"handling app response","from":"NodeID-WqHQbG5MYQAYBRzjJV3qqy263mbz6iko"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.455Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:299","msg":"Got valid signature response","nodeID":"NodeID-GEC1zVazREQmLUpj1fF2qvuATpRvMgf62"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.455Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:262","msg":"Processing response from node","nodeID":"NodeID-WqHQbG5MYQAYBRzjJV3qqy263mbz6iko"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.456Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"app_response"}
{"level":"info","timestamp":"2023-11-07T16:53:09.456Z","logger":"awm-relayer","caller":"peers/external_handler.go:82","msg":"handling app response","from":"NodeID-N1dQ3zWTgcpnxDqejNYLevSQQPGNZSgvg"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.456Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:299","msg":"Got valid signature response","nodeID":"NodeID-WqHQbG5MYQAYBRzjJV3qqy263mbz6iko"}
{"level":"info","timestamp":"2023-11-07T16:53:09.456Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:346","msg":"Created signed message.","destinationChainID":"2Pvk9oh8fxUizgUT3zvBJzxkNw1KiXUEHCi9jnu2ep9PTxUZka"}
{"level":"info","timestamp":"2023-11-07T16:53:09.456Z","logger":"awm-relayer","caller":"teleporter/message_manager.go:232","msg":"Sending message to destination chain","destinationChainID":"2Pvk9oh8fxUizgUT3zvBJzxkNw1KiXUEHCi9jnu2ep9PTxUZka","warpMessageID":"g4BeQDFmKyHNYDzrpFDkGWyHZwuUEZxuPuEpSEioZLVRkRj2d","teleporterMessageID":"756"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.686Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"put"}
{"level":"debug","timestamp":"2023-11-07T16:53:09.988Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"put"}
{"level":"debug","timestamp":"2023-11-07T16:53:10.197Z","logger":"awm-relayer","caller":"peers/external_handler.go:77","msg":"receiving message","op":"put"}
{"level":"info","timestamp":"2023-11-07T16:53:10.261Z","logger":"awm-relayer","caller":"evm/destination_client.go:185","msg":"Sent transaction","txID":"0xc42453567e369641033147c7aa834e8bcbcdc70f30f39909b41988fa2e15d6e8"}
{"level":"info","timestamp":"2023-11-07T16:53:10.261Z","logger":"awm-relayer","caller":"teleporter/message_manager.go:286","msg":"Sent message to destination chain","destinationChainID":"2Pvk9oh8fxUizgUT3zvBJzxkNw1KiXUEHCi9jnu2ep9PTxUZka","warpMessageID":"g4BeQDFmKyHNYDzrpFDkGWyHZwuUEZxuPuEpSEioZLVRkRj2d","teleporterMessageID":"756"}
{"level":"info","timestamp":"2023-11-07T16:53:10.261Z","logger":"awm-relayer","caller":"relayer/message_relayer.go:120","msg":"Finished relaying message to destination chain","destinationChainID":"2Pvk9oh8fxUizgUT3zvBJzxkNw1KiXUEHCi9jnu2ep9PTxUZka"}

@michaelkaplan13 michaelkaplan13 merged commit 515cb8a into main Nov 7, 2023
7 checks passed
@michaelkaplan13 michaelkaplan13 deleted the allow-public-api-for-get-validators branch November 7, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants