feat: add EVM indexer #274

Sekhmet · 2023-11-21T21:04:53Z

Summary

Closes: #188
Closes: https://github.com/snapshot-labs/pitches/issues/29

To make indexers extractable it was separated into Indexer class which instance consumer initiaties. This allows better separation (no need for pulling deps for all networks) - everything currently is exported under either evm or starknet object, it can be extracted later (we might need to extract some common things first into other package). We also have specific types for writers for each network.

Those things are not handled right now:

Reorgs
Indexed event parameters probably don't work (we don't use it in our contracts, but better support should be looked at).

Test plan

Test with https://github.com/checkpoint-labs/checkpoint-template/tree/sekhmet/evm (link checkpoint and graphql).
Run query from below.
Compare events on sepolia etherscan: https://sepolia.etherscan.io/txs?a=0x4b4f7f64be813ccc66aefc3bfce2baa01188631c - no events should be missing.

query {
    proxies (orderBy: created_at_block, orderDirection: desc) {
        id
        implementation
        deployer
        tx_hash
        created_at
        created_at_block
    }
}

bonustrack · 2023-11-23T13:36:57Z

I'm trying to index another contract on Goerli to test but eth_getLogs request timeout after few minutes here is the error:

I've tried to run couple time but still endup with same error at the same block. The last_fetched_block is stuck at 6109472. Here is the branch: https://github.com/checkpoint-labs/checkpoint-template/tree/fabien/evm-poster
Usually request timeout because there are too many events within the requested block range but I don't see any tx within the requested range, txs starting from the block 6623932: https://goerli.etherscan.io/txs?a=0x000000000000cd17345801aa8147b8d3950260ff&p=11 .

Also in _metadatas table the network identifier is not correct, showing starknet_5

Sekhmet · 2023-11-23T15:11:01Z

@bonustrack it seems that it's some issue on Infura side that it can never resolve initial request as we are not doing lots of requests and I even lowered the maximum range we are using to 10 blocks. I asked them about it on Discord.

bonustrack · 2023-11-23T20:16:55Z

I've updated to last changes and even if network id is fixed, it still update last_fetched_block by chunk of 100. It's also extremely slow, take about 10sec to update 100 blocks, tried with Alchemy and Infura got the same performance, with Ankr (doesn't work / resolve). I've tried to add "disable_checkpoints": true in the config file but it seem like it still query and store checkpoints and it fail at some point with this error:

Sekhmet · 2023-11-23T22:09:31Z

I've updated to last changes and even if network id is fixed, it still update last_fetched_block by chunk of 100.

It’s expected, I only modified the max step internally when handling requests.

I will checkout the rest but getLogs call is generally slow and takes 1s for each call. So generally we should do as little requests as possible, but this might increase likelihood of timeouts (however it seems that it can timeout regardless of range).

I assume that Anker either doesn’t support getLogs or fails differently to Infura if responses is too big (need to look into it).

I guess the issue will still be that we depend on slow getLogs call. Maybe we need better way of handling it - instead of fetching all logs we fetch only those that match tracked contracts - and if new contracts are added to sources on the fly we come back. Might solve the issue.

bonustrack · 2023-11-23T22:47:10Z

I guess the issue will still be that we depend on slow getLogs call. Maybe we need better way of handling it - instead of fetching all logs we fetch only those that match tracked contracts - and if new contracts are added to sources on the fly we come back. Might solve the issue.

Yes, I think this is the only way, if we can target specific contract it will be more efficient we wouldn't need to load each blocks events. It might just be a bit tricky to implement. If we prefetch events and find a new deployment event, we can't continue the prefetch process because we know 1 contract will be missing and will require to reset prefetched events. And the logic to discover new template contract address is part of a writer function, which in theory should run in the right order with the others writer functions. It sound like the solution would be to prefetch events until a new deployment is found then run indexer up to this event then continue the prefetch. But curious to know if you have some others ideas.

I think also that we will not find a perfect block range limit for the method eth_getEvent, I've used this in the past and it usually fail when there is too much events, which is something we can't predict, if there is 100 events within 2 blocks it may fail with 2 block range, while it may work for 10000 block range if there is just 10 event within this range. This also depend on node provider limits. I imagine this would require some kind of exponential backoff to deal with range.

Sekhmet · 2023-11-24T11:05:42Z

Yes, I think this is the only way, if we can target specific contract it will be more efficient we wouldn't need to load each blocks events. It might just be a bit tricky to implement. If we prefetch events and find a new deployment event, we can't continue the prefetch process because we know 1 contract will be missing and will require to reset prefetched events. And the logic to discover new template contract address is part of a writer function, which in theory should run in the right order with the others writer functions. It sound like the solution would be to prefetch events until a new deployment is found then run indexer up to this event then continue the prefetch. But curious to know if you have some others ideas.

I think this was considered as a one way of implementing it for Starknet before, but we decided to go other way - I can't find the discussion anymore to see what was the reasoning - ideally we should have consistent logic on all networks - at least to certain degree.

I think also that we will not find a perfect block range limit for the method eth_getEvent, I've used this in the past and it usually fail when there is too much events, which is something we can't predict, if there is 100 events within 2 blocks it may fail with 2 block range, while it may work for 10000 block range if there is just 10 event within this range. This also depend on node provider limits. I imagine this would require some kind of exponential backoff to deal with range.

Infura handles that and tells you new range to try and we handle that:

checkpoint/src/providers/evm/provider.ts

Lines 267 to 271 in 81dd6bf

    
           currentFrom = parseInt(body.error.data.from, 16); 
        
           currentTo = Math.min( 
        
             parseInt(body.error.data.to, 16), 
        
             currentFrom + MAX_BLOCKS_PER_REQUEST 
        
           );

bonustrack · 2023-11-24T11:38:09Z

@Sekhmet I just found the discussion, it's here: https://discord.com/channels/1088202060283007137/1088202061176381492/1113515047277318214

Haven't read it yet, but I believe we can do such change on both Starknet and EVM chains.

Infura handles that and tells you new range to try and we handle that:

Ok, but if it's specific to Infura it wouldn't be ideal, we can't assume that everyone including us would use Infura

Sekhmet · 2023-11-28T16:28:14Z

I spent some time debugging timeouts and for some reason it very rarely happens when using curl (only been able to reproduce it while running checkpoint in background), but was able to reproduce it using simple script with node-fetch:
https://gist.github.com/Sekhmet/a8f3bb693f51fb7fdf713ce5e6253ff7

This also happens when using address filter and topics filter, so I'm unsure if we can do something about it, I provided those PoCs to Infura for debugging.

bonustrack · 2023-11-28T16:52:31Z

If you like I can send you endpoint from others providers to see if this issue is just on Infura side or not. Even if this resolve fast I think syncing all events from a chain is never going to be fast, on the chain like Arbitrum where block time is small it will become even more problematic, we most likely would need to do the change we've discussed previously to sync only relevant contracts.

To make indexers extractable it was separated into Indexer class which instance consumer initiaties. This allows better separation (no need for pulling deps for all networks) - everything currently is exported under either evm or starknet object, it can be extracted later (we might need to extract some common things first into other package). We also have specific types for writers for each network. This could also be useful in the future if we put multiple APIs in single instance of checkpoint, it could accept indexers instead of just single indexer.

Sekhmet · 2024-05-15T13:48:29Z

@bonustrack would be great if we can review this, would be nice to have refactor land on master to avoid conflicts.

bonustrack · 2024-05-15T15:01:15Z

src/checkpoint.ts

  public opts?: CheckpointOptions;
  public schema: string;

  private readonly entityController: GqlEntityController;
  private readonly log: Logger;
-  private readonly networkProvider: BaseProvider;
+  private readonly indexer: EvmIndexer;


Any reason to have something EVM specific on this file?

Good catch, it should be BaseIndexer.

bonustrack · 2024-05-15T16:27:18Z

What do you mean by "Indexed event parameters probably don't work"? Is it about events coming from any contracts (not defined in config)?

bonustrack · 2024-05-15T17:13:31Z

It seem to works well, when I compared events I can see I have one extra event, 26 instead of 25, not sure why, this is the event that I get, it's the very first event detected by Checkpoint on this contract, not sure why it's missing on Etherscan:

{
  "id": "0x3fBc546BC7Fcf1e6eC7dAdfe6eBCf3c3ad2713ed",
  "implementation": "0xC3031A7d3326E47D49BfF9D374d74f364B29CE4D",
  "deployer": "0x556B14CbdA79A36dC33FcD461a04A5BCb5dC2A70",
  "tx_hash": "0xa97e863f7089dc5ee3852edaf911d5eb4098c8908ba3b9756c16164f1b5caf4d",
  "created_at": 1713349560,
  "created_at_block": 5717246
}

Sekhmet · 2024-05-15T19:15:44Z

@bonustrack it shows last 25 transactions by default, and it's 26th.
https://sepolia.etherscan.io/txs?a=0x4b4f7f64be813ccc66aefc3bfce2baa01188631c

There are more transactions we just index them from specific block.

bonustrack · 2024-05-15T19:49:36Z

Ok actually I was looking here and seen only 25 events: https://sepolia.etherscan.io/address/0x4b4f7f64be813ccc66aefc3bfce2baa01188631c#events
Do you say there is actually 147 txs and events on this contract but sync start at a specific block and it's correct if I have 26 proxies on my Checkpoint test instance?

Sekhmet · 2024-05-16T07:59:20Z

Yes, I didn't want to make it index everything, but we could. I was checking it by looking at all transactions in Etherscan starting at block 5717246 (the one we have configured) and making sure all of those are also present in Checkpoint.

bonustrack

tACK

Sekhmet self-assigned this Nov 21, 2023

Sekhmet marked this pull request as ready for review November 22, 2023 18:16

Sekhmet requested a review from bonustrack November 22, 2023 18:16

Sekhmet added 4 commits May 13, 2024 15:05

chore: fetch events from at most 10 blocks at a time

74690ed

fix: update network ID

488e615

refactor: only fetch targeted events

55722a8

Sekhmet force-pushed the sekhmet/evm-support branch from 81dd6bf to 55722a8 Compare May 13, 2024 15:21

feat: add simple range adjustment logic

c84d41f

bonustrack reviewed May 15, 2024

View reviewed changes

fix: use BaseIndexer instead of EvmIndexer in Checkpoint

e0a3dd1

bonustrack approved these changes May 16, 2024

View reviewed changes

Sekhmet merged commit 2a34aec into master May 16, 2024
1 check passed

Sekhmet deleted the sekhmet/evm-support branch May 16, 2024 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add EVM indexer #274

feat: add EVM indexer #274

Sekhmet commented Nov 21, 2023 •

edited

Loading

bonustrack commented Nov 23, 2023

Sekhmet commented Nov 23, 2023

bonustrack commented Nov 23, 2023

Sekhmet commented Nov 23, 2023

bonustrack commented Nov 23, 2023 •

edited

Loading

Sekhmet commented Nov 24, 2023

bonustrack commented Nov 24, 2023 •

edited

Loading

Sekhmet commented Nov 28, 2023

bonustrack commented Nov 28, 2023

Sekhmet commented May 15, 2024

bonustrack May 15, 2024

Sekhmet May 15, 2024

bonustrack commented May 15, 2024 •

edited

Loading

bonustrack commented May 15, 2024 •

edited

Loading

Sekhmet commented May 15, 2024

bonustrack commented May 15, 2024

Sekhmet commented May 16, 2024

bonustrack left a comment

feat: add EVM indexer #274

feat: add EVM indexer #274

Conversation

Sekhmet commented Nov 21, 2023 • edited Loading

Summary

Test plan

bonustrack commented Nov 23, 2023

Sekhmet commented Nov 23, 2023

bonustrack commented Nov 23, 2023

Sekhmet commented Nov 23, 2023

bonustrack commented Nov 23, 2023 • edited Loading

Sekhmet commented Nov 24, 2023

bonustrack commented Nov 24, 2023 • edited Loading

Sekhmet commented Nov 28, 2023

bonustrack commented Nov 28, 2023

Sekhmet commented May 15, 2024

bonustrack May 15, 2024

Choose a reason for hiding this comment

Sekhmet May 15, 2024

Choose a reason for hiding this comment

bonustrack commented May 15, 2024 • edited Loading

bonustrack commented May 15, 2024 • edited Loading

Sekhmet commented May 15, 2024

bonustrack commented May 15, 2024

Sekhmet commented May 16, 2024

bonustrack left a comment

Choose a reason for hiding this comment

Sekhmet commented Nov 21, 2023 •

edited

Loading

bonustrack commented Nov 23, 2023 •

edited

Loading

bonustrack commented Nov 24, 2023 •

edited

Loading

bonustrack commented May 15, 2024 •

edited

Loading

bonustrack commented May 15, 2024 •

edited

Loading