Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: Integrate Transfer Verifier into the Ethereum watcher #4233

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

johnsaigle
Copy link
Contributor

@johnsaigle johnsaigle commented Jan 22, 2025

New Features

  • Create a TxVerifier instance in the Run function of the EVM watcher
  • Wrap all instances where a message would be published to the broadcasting msg channel with a new function, verifyAndPublish
  • Messages are published with a new status: VerificationState.
  • Modified the Transfer Verifier pkg API so that we can call the method with either an existing Receipt or a txHash.
  • Added unit test in the watcher for new functionality

If Transfer Verifier is enabled

  • Messages that are token transfers will undergo Transfer Verification
  • Message will be published with a status of Valid or Rejected depending on the result
  • The calling code can then decide what to do based on this status

If Transfer Verifier is not enabled

  • Existing behaviour will be preserved, but messages will be published with a status of NotApplicable. No further actions are taken when a Message Publication has this status

Design Considerations

Modifying MessagePublication

This PR modifies MessagePublication to add a new status based on whether the Message Publication is verified. This decision was made to handle Transfer Verification cases across many chains. For example, the EVM logs are reliable enough that we can confidently rule a message as Valid or Rejected. Other ecosystems (i.e. Sui, but perhaps also Solana, etc.) are not so clear cut. In this case, we may want to mark a transaction as Anomalous rather than outright rejecting it.

Using this new enum allow us to do this.

Other potential benefits:

  • Decouples the Verification of a Message from whether or not this can be published.
  • Avoids scope-creep for the Watchers: they only watch messages, but do not need to reject them. (Instead this could be handled by the processor or some other security mechanism akin to the Governor or Accountant.)
  • Allows configuring targeted action on a per-chain and per-status basis. For example, we may want to delay Anomalous messages but drop Rejected ones.
  • Preserves a NotApplicable state that can be used as a fallback mechanism if the Transfer Verifier is disabled outright or on a particular chain
  • This status could be used in other cases beyond Transfer Verification, but should not interfere with existing message handling.

Scope of this PR

The idea with this PR is to test the modifications to the Ethereum watcher without enabling the transfer verifier yet. We can release this in testnet or mainnet to be sure that the changes here are stable.

Once we're confident that the underlying mechanism is working well, we can add some logic in the Guardian to actually react when a MessagePublication has a status like Rejected or Anomalous. For now, changing the reaction of the watcher to a "bad" message is out of scope.

Questions

  • Is it necessary to update the Protobuf files somewhere in order to capture the changes to Message Publication?

Related Work

Copy link
Collaborator

@banescusebi banescusebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of suggestions

@johnsaigle johnsaigle force-pushed the tv-eth-watcher branch 3 times, most recently from 688ee88 to 96a2566 Compare February 27, 2025 20:36
@johnsaigle johnsaigle marked this pull request as ready for review February 28, 2025 15:58
Copy link
Collaborator

@djb15 djb15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration looks good overall! Would be good to deal with the outstanding TODOs if possible

switch w.env {
case common.UnsafeDevNet:
tbridge = eth_common.BytesToAddress(sdk.KnownDevnetTokenbridgeEmitters[w.chainID])
weth = eth_common.HexToAddress("0xDDb64fE46a91D46ee29420539FC25FD07c5FEa3E")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the TODO, how about creating a set of Known[X]NetWETHAddress constants in the SDK alongside the existing constants?

w.txVerifier, tvErr = txverifier.NewTransferVerifier(
w.ethConn,
&addrs,
20,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to explain the choice of magic number?

@johnsaigle johnsaigle marked this pull request as draft March 3, 2025 18:02
@johnsaigle johnsaigle marked this pull request as ready for review March 3, 2025 20:31
@johnsaigle johnsaigle requested review from pleasew8t and djb15 March 3, 2025 20:31
@johnsaigle johnsaigle marked this pull request as draft March 3, 2025 20:35
Comment on lines 200 to 201
var KnownWrappedNativeAddress = map[vaa.ChainID]common.Address{
// WETH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to try and maintain consistency here by typing the mapping as map[vaa.ChainID]string.

@@ -41,3 +42,9 @@ var KnownDevnetAutomaticRelayerEmitters = []struct {
{ChainId: vaa.ChainIDEthereum, Addr: "000000000000000000000000cc680d088586c09c3e0e099a676fa4b6e42467b4"},
{ChainId: vaa.ChainIDBSC, Addr: "000000000000000000000000cc680d088586c09c3e0e099a676fa4b6e42467b4"},
}

// KnownDevnetWrappedNativeAddress is a map of wrapped native addresses by chain ID, e.g. WETH for Ethereum
var KnownDevnetWrappedNativeAddresses = map[vaa.ChainID]common.Address{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about typing to map[vaa.ChainID]string.

Comment on lines 110 to 111
var KnownTestnetWrappedNativeAddresses = map[vaa.ChainID]common.Address{
// WETH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about typing to map[vaa.ChainID]string. I'd also argue that we can have one less import in this file by doing so.

Comment on lines 21 to 23
// This type represents whether a message has been verified for some definition of verification. This may mean different things
// on a per-application or per-chain basis.
// NOTE: This status is currently used only by the Transfer Verifier for supported chains.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This type represents whether a message has been verified for some definition of verification. This may mean different things
// on a per-application or per-chain basis.
// NOTE: This status is currently used only by the Transfer Verifier for supported chains.
// The `VerificationState` is the result of applying transfer verification to the transaction associated with the `MessagePublication`.
// While this could likely be extended to additional security controls in the future, it is only used for `txverifier` at present.


const (
// The default state for a message. This can be used before verification occurs. If no verification is required, `NotApplicable` should be used instead.
// NOTE: this value is used as the default, zero-value for the type, so this field should not be re-ordered among other variants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this line

Rejected
// Represents an unusual state after validation, neither confirmed to be good or bad.
Anomalous
// Represents a "known good" status where a Message has been validated and the result is good. The message should be process normally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Represents a "known good" status where a Message has been validated and the result is good. The message should be process normally.
// Represents a "known good" status where a Message has been validated and the result is good. The message should be processed normally.

@@ -33,6 +52,9 @@ type MessagePublication struct {
// Unreliable indicates if this message can be reobserved. If a message is considered unreliable it cannot be
// reobserved.
Unreliable bool
// This type represents whether a message has been verified for some definition of verification. This may mean different things
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse comment above VerificationState type definition.

)
}

func (v VerificationState) String() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this definition up to the type definition.

}
}

updateErr := msg.SetVerificationState(verificationState)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about this pattern. My thinking is that before transfer verification this logic did not matter, and therefore the guardian would continue processing transfers regardless of the transfer verification operations. Should we not have a verification state that indicates this error ocurred, keep logging it, and pass the message to the msg channel anyway? Until we see this succeeding in a live environment, we should probably make transfer verification observent rather than interacting, if that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what happens generally. This error is more of a type-safety thing to prevent overwriting an existing state. It could log an error instead of returning I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this inside of the if statement in this function. In the previous commit, the state of the message would be changed from NotVerified to NotApplicable, and potentially return an error. Now, the error handling and state change will not occur unless transfer verifier is enabled.

@@ -701,14 +783,25 @@ func (w *Watcher) postMessage(logger *zap.Logger, ev *ethabi.AbiLogMessagePublis
zap.Uint8("ConsistencyLevel", ev.ConsistencyLevel),
)

w.msgC <- message
ctx, cancel := context.WithCancel(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a queue from the Sui watcher and made it so the sui tx verifier uses context.WithTimeout, which kind of makes more sense to me. Unless there is something specific that cancels the context in the event of an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I chose WithCancel here because other parts of the watcher do this, and they can be cancelled if the code panics. Basically I'm going off of this:

// later on we will spawn multiple go-routines through `RunWithScissors`, i.e. catching panics.
// If any of them panic, this function will return, causing this child context to be canceled
// such that the other go-routines can free up resources
ctx, watcherContextCancelFunc := context.WithCancel(parentCtx)
defer watcherContextCancelFunc()


if pubErr != nil {
w.logger.Error("Error when publishing message", zap.Error(err))
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do decide to continue sending the message publication over the channel, regardless of potential runtime errors, remember to remove the continue so the number of observations is still incremented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the continue statements in the most recent commit

@@ -146,6 +146,7 @@ spec:
- --ccqEnabled=true
- --ccqAllowedRequesters=beFA429d57cD18b7F8A4d91A2da9AB4AF05d0FBe,25021A4FCAf61F2EADC8202D3833Df48B2Fa0D54
- --ccqAllowedPeers=12D3KooWSnju8zhywCYVi2JwTqky1sySPnmtYLsHHzc4WerMnDQH,12D3KooWM6WqedfR6ehtTd1y6rJu3ZUrEkTjcJJnJZYesjd89zj8
# - --transferVerifierEnabledChains=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, similar to Governor in the same file.

Comment on lines 1926 to 1927
// NOTE: Using a known capacity and counter here avoids unnecessary reallocations compared to using `append()`.
enabled := make([]vaa.ChainID, len(parsed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just append? This is startup, which I don't think is that performance-sensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but I don't think there's much benefit to append

@@ -500,6 +510,8 @@ func init() {
gatewayRelayerKeyPassPhrase = NodeCmd.Flags().String("gatewayRelayerKeyPassPhrase", "", "Pass phrase used to unarmor the gateway relayer key file")

subscribeToVAAs = NodeCmd.Flags().Bool("subscribeToVAAs", false, "Guardiand should subscribe to incoming signed VAAs, set to true if running a public RPC node")

transferVerifierEnabledChains = NodeCmd.Flags().String("transferVerifierEnabledChains", "", "Transfer Verifier will be enabled for these chain IDs (comma-separated)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Flags().IntSlice()?

Copy link
Contributor Author

@johnsaigle johnsaigle Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done using UintSlice in the latest commit

- Avoid modifying the message or changing error handling in the
  watcher's message publication flow when transfer verifier is disabled
- Remove continue statements when verifyAndPublish returns an error in
  the reobservation flow
- Modify code comments around VerificationState
- Use UintSlice when parsing CLI flags
- Change new SDK constants to be strings instead of common.Address to be
  consistent with other SDK values
@@ -500,6 +510,8 @@ func init() {
gatewayRelayerKeyPassPhrase = NodeCmd.Flags().String("gatewayRelayerKeyPassPhrase", "", "Pass phrase used to unarmor the gateway relayer key file")

subscribeToVAAs = NodeCmd.Flags().Bool("subscribeToVAAs", false, "Guardiand should subscribe to incoming signed VAAs, set to true if running a public RPC node")

transferVerifierEnabledChains = NodeCmd.Flags().UintSlice("transferVerifierEnabledChains", make([]uint, 2), "Transfer Verifier will be enabled for these chain IDs (comma-separated)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea behind the default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm my intention is that you can invoke the guardian with e.g.:

txVerifierChains=1,2

but not txVerifierChains= (blank)

If you mean the make here, I gave it a length and capacity of 2 more or less arbitrarily. I guess I was thinking that if we enable Ethereum and Sui, that's two chains. The value could easily be 0 or 10 or anything else.

// KnownWrappedNativeAddress is a map of wrapped native addresses by chain ID, e.g. WETH for Ethereum
var KnownWrappedNativeAddress = map[vaa.ChainID]string{
// WETH
vaa.ChainIDEthereum: "0xc8f93d9738e7Ad5f3aF8c548DB2f6B7F8082B5e8",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what I'm missing, but WETH on eth mainnet is at 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, I think this was a bad copy-paste from the testnet or devnet constant

// Channel to send new messages to.
// Channel for sending new MesssagePublications. Messages should not be sent
// to this channel directly. Instead, they should be wrapped by
// a call to `publishIfSafe()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to verifyAndPublish()


// Fetch the constants for the Token Bridge and the WETH address from the SDK.
var tbridge []byte
var weth string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename weth to wnative?

weth = sdk.KnownTestnetWrappedNativeAddresses[w.chainID]
case common.MainNet:
tbridge = sdk.KnownTokenbridgeEmitters[w.chainID]
// https://etherscan.io/token/0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

Comment on lines +292 to +293
CoreBridgeAddr: w.contract,
TokenBridgeAddr: eth_common.BytesToAddress(tbridge[:]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my code I added a KnownMainnetCoreContracts to the constants file. I think it just makes sense. You don't have to do it here, but I'm going to keep the change on my branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use KnownTokenbridgeEmitters?

Comment on lines +299 to +300
pruneHeightDelta := uint64(20)
var tvErr error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can here Jeff and Bruce saying this needs to be a constant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a constant!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the prune height delta you mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants