Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add EVM indexer #274

Merged
merged 6 commits into from
May 16, 2024
Merged

feat: add EVM indexer #274

merged 6 commits into from
May 16, 2024

Conversation

Sekhmet
Copy link
Contributor

@Sekhmet Sekhmet commented Nov 21, 2023

Summary

Closes: #188
Closes: https://github.com/snapshot-labs/pitches/issues/29

To make indexers extractable it was separated into Indexer class which instance consumer initiaties. This allows better separation (no need for pulling deps for all networks) - everything currently is exported under either evm or starknet object, it can be extracted later (we might need to extract some common things first into other package). We also have specific types for writers for each network.

Those things are not handled right now:

  • Reorgs
  • Indexed event parameters probably don't work (we don't use it in our contracts, but better support should be looked at).

Test plan

query {
    proxies (orderBy: created_at_block, orderDirection: desc) {
        id
        implementation
        deployer
        tx_hash
        created_at
        created_at_block
    }
}

@Sekhmet Sekhmet self-assigned this Nov 21, 2023
@Sekhmet Sekhmet marked this pull request as ready for review November 22, 2023 18:16
@bonustrack
Copy link
Contributor

I'm trying to index another contract on Goerli to test but eth_getLogs request timeout after few minutes here is the error:
image
I've tried to run couple time but still endup with same error at the same block. The last_fetched_block is stuck at 6109472. Here is the branch: https://github.com/checkpoint-labs/checkpoint-template/tree/fabien/evm-poster
Usually request timeout because there are too many events within the requested block range but I don't see any tx within the requested range, txs starting from the block 6623932: https://goerli.etherscan.io/txs?a=0x000000000000cd17345801aa8147b8d3950260ff&p=11 .

Also in _metadatas table the network identifier is not correct, showing starknet_5

@Sekhmet
Copy link
Contributor Author

Sekhmet commented Nov 23, 2023

@bonustrack it seems that it's some issue on Infura side that it can never resolve initial request as we are not doing lots of requests and I even lowered the maximum range we are using to 10 blocks. I asked them about it on Discord.

@bonustrack
Copy link
Contributor

I've updated to last changes and even if network id is fixed, it still update last_fetched_block by chunk of 100. It's also extremely slow, take about 10sec to update 100 blocks, tried with Alchemy and Infura got the same performance, with Ankr (doesn't work / resolve). I've tried to add "disable_checkpoints": true in the config file but it seem like it still query and store checkpoints and it fail at some point with this error:

image

@Sekhmet
Copy link
Contributor Author

Sekhmet commented Nov 23, 2023

I've updated to last changes and even if network id is fixed, it still update last_fetched_block by chunk of 100.

It’s expected, I only modified the max step internally when handling requests.

I will checkout the rest but getLogs call is generally slow and takes 1s for each call. So generally we should do as little requests as possible, but this might increase likelihood of timeouts (however it seems that it can timeout regardless of range).

I assume that Anker either doesn’t support getLogs or fails differently to Infura if responses is too big (need to look into it).

I guess the issue will still be that we depend on slow getLogs call. Maybe we need better way of handling it - instead of fetching all logs we fetch only those that match tracked contracts - and if new contracts are added to sources on the fly we come back. Might solve the issue.

@bonustrack
Copy link
Contributor

bonustrack commented Nov 23, 2023

I guess the issue will still be that we depend on slow getLogs call. Maybe we need better way of handling it - instead of fetching all logs we fetch only those that match tracked contracts - and if new contracts are added to sources on the fly we come back. Might solve the issue.

Yes, I think this is the only way, if we can target specific contract it will be more efficient we wouldn't need to load each blocks events. It might just be a bit tricky to implement. If we prefetch events and find a new deployment event, we can't continue the prefetch process because we know 1 contract will be missing and will require to reset prefetched events. And the logic to discover new template contract address is part of a writer function, which in theory should run in the right order with the others writer functions. It sound like the solution would be to prefetch events until a new deployment is found then run indexer up to this event then continue the prefetch. But curious to know if you have some others ideas.

I think also that we will not find a perfect block range limit for the method eth_getEvent, I've used this in the past and it usually fail when there is too much events, which is something we can't predict, if there is 100 events within 2 blocks it may fail with 2 block range, while it may work for 10000 block range if there is just 10 event within this range. This also depend on node provider limits. I imagine this would require some kind of exponential backoff to deal with range.

@Sekhmet
Copy link
Contributor Author

Sekhmet commented Nov 24, 2023

Yes, I think this is the only way, if we can target specific contract it will be more efficient we wouldn't need to load each blocks events. It might just be a bit tricky to implement. If we prefetch events and find a new deployment event, we can't continue the prefetch process because we know 1 contract will be missing and will require to reset prefetched events. And the logic to discover new template contract address is part of a writer function, which in theory should run in the right order with the others writer functions. It sound like the solution would be to prefetch events until a new deployment is found then run indexer up to this event then continue the prefetch. But curious to know if you have some others ideas.

I think this was considered as a one way of implementing it for Starknet before, but we decided to go other way - I can't find the discussion anymore to see what was the reasoning - ideally we should have consistent logic on all networks - at least to certain degree.

I think also that we will not find a perfect block range limit for the method eth_getEvent, I've used this in the past and it usually fail when there is too much events, which is something we can't predict, if there is 100 events within 2 blocks it may fail with 2 block range, while it may work for 10000 block range if there is just 10 event within this range. This also depend on node provider limits. I imagine this would require some kind of exponential backoff to deal with range.

Infura handles that and tells you new range to try and we handle that:

currentFrom = parseInt(body.error.data.from, 16);
currentTo = Math.min(
parseInt(body.error.data.to, 16),
currentFrom + MAX_BLOCKS_PER_REQUEST
);

@bonustrack
Copy link
Contributor

bonustrack commented Nov 24, 2023

@Sekhmet I just found the discussion, it's here: https://discord.com/channels/1088202060283007137/1088202061176381492/1113515047277318214

Haven't read it yet, but I believe we can do such change on both Starknet and EVM chains.

Infura handles that and tells you new range to try and we handle that:

Ok, but if it's specific to Infura it wouldn't be ideal, we can't assume that everyone including us would use Infura

@Sekhmet
Copy link
Contributor Author

Sekhmet commented Nov 28, 2023

I spent some time debugging timeouts and for some reason it very rarely happens when using curl (only been able to reproduce it while running checkpoint in background), but was able to reproduce it using simple script with node-fetch:
https://gist.github.com/Sekhmet/a8f3bb693f51fb7fdf713ce5e6253ff7

This also happens when using address filter and topics filter, so I'm unsure if we can do something about it, I provided those PoCs to Infura for debugging.

@bonustrack
Copy link
Contributor

If you like I can send you endpoint from others providers to see if this issue is just on Infura side or not. Even if this resolve fast I think syncing all events from a chain is never going to be fast, on the chain like Arbitrum where block time is small it will become even more problematic, we most likely would need to do the change we've discussed previously to sync only relevant contracts.

To make indexers extractable it was separated into Indexer class
which instance consumer initiaties. This allows better separation
(no need for pulling deps for all networks) - everything currently
is exported under either evm or starknet object, it can be extracted
later (we might need to extract some common things first into other
package). We also have specific types for writers for each network.

This could also be useful in the future if we put multiple APIs
in single instance of checkpoint, it could accept indexers instead
of just single indexer.
@Sekhmet
Copy link
Contributor Author

Sekhmet commented May 15, 2024

@bonustrack would be great if we can review this, would be nice to have refactor land on master to avoid conflicts.

public opts?: CheckpointOptions;
public schema: string;

private readonly entityController: GqlEntityController;
private readonly log: Logger;
private readonly networkProvider: BaseProvider;
private readonly indexer: EvmIndexer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have something EVM specific on this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it should be BaseIndexer.

@bonustrack
Copy link
Contributor

bonustrack commented May 15, 2024

What do you mean by "Indexed event parameters probably don't work"? Is it about events coming from any contracts (not defined in config)?

@bonustrack
Copy link
Contributor

bonustrack commented May 15, 2024

It seem to works well, when I compared events I can see I have one extra event, 26 instead of 25, not sure why, this is the event that I get, it's the very first event detected by Checkpoint on this contract, not sure why it's missing on Etherscan:

{
  "id": "0x3fBc546BC7Fcf1e6eC7dAdfe6eBCf3c3ad2713ed",
  "implementation": "0xC3031A7d3326E47D49BfF9D374d74f364B29CE4D",
  "deployer": "0x556B14CbdA79A36dC33FcD461a04A5BCb5dC2A70",
  "tx_hash": "0xa97e863f7089dc5ee3852edaf911d5eb4098c8908ba3b9756c16164f1b5caf4d",
  "created_at": 1713349560,
  "created_at_block": 5717246
}

@Sekhmet
Copy link
Contributor Author

Sekhmet commented May 15, 2024

@bonustrack it shows last 25 transactions by default, and it's 26th.
https://sepolia.etherscan.io/txs?a=0x4b4f7f64be813ccc66aefc3bfce2baa01188631c

There are more transactions we just index them from specific block.

@bonustrack
Copy link
Contributor

Ok actually I was looking here and seen only 25 events: https://sepolia.etherscan.io/address/0x4b4f7f64be813ccc66aefc3bfce2baa01188631c#events
Do you say there is actually 147 txs and events on this contract but sync start at a specific block and it's correct if I have 26 proxies on my Checkpoint test instance?

@Sekhmet
Copy link
Contributor Author

Sekhmet commented May 16, 2024

Yes, I didn't want to make it index everything, but we could. I was checking it by looking at all transactions in Etherscan starting at block 5717246 (the one we have configured) and making sure all of those are also present in Checkpoint.

Copy link
Contributor

@bonustrack bonustrack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK

@Sekhmet Sekhmet merged commit 2a34aec into master May 16, 2024
1 check passed
@Sekhmet Sekhmet deleted the sekhmet/evm-support branch May 16, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support Ethereum events indexing
2 participants