Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: remove TrieUpdates::removed_nodes and StorageTrieUpdates::removed_nodes (attempt 2) #13929

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kien-rise
Copy link
Contributor

@kien-rise kien-rise commented Jan 22, 2025

Replaces #13872. This PR is only slightly worse than #13872, but the code changes are minimal.

Checklist
  • E2E Benchmarks
  • Add benches/*
  • Improve PR description

Motivation

Under high load (TPS > 50,000), MemoryOverlayStateProviderRef::trie_state is taking a considerable amount of time (over 200ms). One factor contributing to this is the TrieUpdates::extend_ref function. Optimizing the struct definition of TrieUpdates could help improve the performance of the extend_ref function.

Solution

First, this PR changes the struct definitions of TrieUpdates and StorageTrieUpdates:

pub struct TrieUpdates {
-    pub account_nodes: HashMap<Nibbles, BranchNodeCompact>,
-    pub removed_nodes: HashSet<Nibbles>,
+    pub changed_nodes: HashMap<Nibbles, Option<BranchNodeCompact>>,
     pub storage_tries: B256HashMap<StorageTrieUpdates>,
 }

pub struct StorageTrieUpdates {
     pub is_deleted: bool,
-    pub storage_nodes: HashMap<Nibbles, BranchNodeCompact>,
-    pub removed_nodes: HashSet<Nibbles>,
+    pub changed_nodes: HashMap<Nibbles, Option<BranchNodeCompact>>,
 }

Next, this PR replaces the following steps in fn TrieUpdates::extend_ref:

  1. self.account_nodes.retain(|nibbles, _| !other.removed_nodes.contains(nibbles));
  2. self.account_nodes.extend(exclude_empty_from_pair(other.account_nodes.iter().map(|(k, v)| (k.clone(), v.clone()))));
  3. self.removed_nodes.extend(exclude_empty(other.removed_nodes.iter().cloned()));

by

  1. self.account_nodes.extend(exclude_empty_from_pair(other.account_nodes.iter().map(|(k, v)| (k.clone(), v.clone()))));

Similar changes is applied to StorageTrieUpdates::extend_ref.

Criterion Benchmarks

Note: the current criterion benchmark does not produce very stable results.

[Before]
ERC20                   time:   [503.56 ms 503.85 ms 504.13 ms]
Raw Transfer            time:   [205.55 ms 205.87 ms 206.19 ms]
Uniswap                 time:   [55.124 ms 55.255 ms 55.393 ms]

[After]
ERC20                   time:   [476.01 ms 476.36 ms 476.72 ms]
Raw Transfer            time:   [112.22 ms 112.33 ms 112.49 ms]
Uniswap                 time:   [31.368 ms 31.551 ms 31.738 ms]

E2E Benchmarks

Total run time for state root calculation (2)

(unit: μs) Before After Ratio
ERC20 585111076 523309918 0.8943770499
Raw Transfer 323457659 298599422 0.923148405
Uniswap 1254878698 1151877682 0.9179195438
Before and After

Before

  • erc20-from16-1914-super-low-dependency.zip (30600/150000)
    30,601.00 tps, 1,056,914,565.83 gps, no chain lag, 585111076/609686114 μs
  • raw-transfer-from05-1723-super-low-dependency.zip (54600/280000)
    54,594.73 tps, 1,146,512,812.78 gps, no chain lag, 323457659/332195794 μs
  • uniswap-from14-1848-super-low-dependency.zip (6330/25320)
    6,331.00 tps, 882,451,082.10 gps, no chain lag, 1254878698/1270738569 μs

After

  • erc20-from16-1914-super-low-dependency.zip (30600/150000)
    30,601.00 tps, 1,056,921,850.84 gps, no chain lag, 523309918/561875948 μs
  • raw-transfer-from05-1723-super-low-dependency.zip (54600/280000)
    54,601.00 tps, 1,146,644,512.25 gps, no chain lag, 298599422 μs/315966032 μs
  • uniswap-from14-1848-super-low-dependency.zip (6330/25320)
    6,331.00 tps, 882,450,529.38 gps, no chain lag, 1151877682/1180946313 μs
Other benchmarks
"785bc168-9976731c-erc20-30600"
{"tps": 30601.0, "gps": 1056921850.8367347, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-erc20-31000"
{"tps": 31001.0, "gps": 1070736094.1406412, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-erc20-32000"
{"tps": 32001.0, "gps": 1105272921.4311633, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-erc20-33000"
{"tps": 33001.0, "gps": 1139812889.5115511, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-erc20-34000"
{"tps": 34001.0, "gps": 1174346187.1292517, "is_chain_lagged": true, "chain_lag_distance": 4}
"785bc168-9976731c-erc20-36000"
{"tps": 36001.0, "gps": 1243424508.7788463, "is_chain_lagged": true, "chain_lag_distance": 20}
"785bc168-9976731c-erc20-40000"
{"tps": 40001.0, "gps": 1381586516.224, "is_chain_lagged": true, "chain_lag_distance": 61}
"785bc168-9976731c-raw-transfer-54600"
{"tps": 54601.0, "gps": 1146644512.2513661, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-raw-transfer-55000"
{"tps": 55001.0, "gps": 1155044519.4926472, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-raw-transfer-56000"
{"tps": 56001.0, "gps": 1176044515.2560747, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-raw-transfer-57000"
{"tps": 57001.0, "gps": 1197044511.5361216, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-raw-transfer-58000"
{"tps": 58001.0, "gps": 1218044525.3953488, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-raw-transfer-60000"
{"tps": 60001.0, "gps": 1260044514.0, "is_chain_lagged": true, "chain_lag_distance": 5}
"785bc168-9976731c-raw-transfer-64000"
{"tps": 64001.0, "gps": 1344044510.4796574, "is_chain_lagged": true, "chain_lag_distance": 26}
"785bc168-9976731c-uniswap-6330"
{"tps": 6331.0, "gps": 882450529.3786408, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-uniswap-6400"
{"tps": 6401.0, "gps": 892212483.2733675, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-uniswap-6480"
{"tps": 6481.0, "gps": 903381443.1144955, "is_chain_lagged": false, "chain_lag_distance": 1}
"785bc168-9976731c-uniswap-6500"
{"tps": 6501.0, "gps": 906146040.2489707, "is_chain_lagged": false, "chain_lag_distance": 0}
"785bc168-9976731c-uniswap-6540"
{"tps": 6541.0, "gps": 911705913.7027911, "is_chain_lagged": true, "chain_lag_distance": 2}
"785bc168-9976731c-uniswap-6580"
{"tps": 6581.0, "gps": 917291024.3644142, "is_chain_lagged": false, "chain_lag_distance": 1}

@kien-rise kien-rise marked this pull request as draft January 22, 2025 17:09
@emhane emhane added the C-perf A change motivated by improving speed, memory usage or disk footprint label Jan 22, 2025
@kien-rise kien-rise marked this pull request as ready for review January 23, 2025 20:03
@kien-rise kien-rise requested a review from gakonst as a code owner January 23, 2025 20:03
@kien-rise kien-rise changed the title [WIP] perf: remove TrieUpdates::removed_nodes and StorageTrieUpdates::removed_nodes (attempt 2) perf: remove TrieUpdates::removed_nodes and StorageTrieUpdates::removed_nodes (attempt 2) Jan 23, 2025
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I almost understood this, but I'm unequipped to review this in detail.

ptal @rkrasiuk

imo if feasible we should try to get this in, because it makes sense why this is significantly more performant

Comment on lines -21 to -22
account_nodes: HashMap<Nibbles, EntryDiff<Option<BranchNodeCompact>>>,
removed_nodes: HashMap<Nibbles, EntryDiff<bool>>,
Copy link
Collaborator

@mattsse mattsse Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this refactor isn't straight forward to me, unclear how account/removed nodes translate to task/regular/database

I'd appreciate a few additional docs

Comment on lines -213 to +164
.storage_nodes
.keys()
.chain(regular.storage_nodes.keys())
for key in Iterator::chain(task.changed_nodes.keys(), regular.changed_nodes.keys())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the fully qualified syntax here

Comment on lines +186 to +188
task: &Option<Option<BranchNodeCompact>>,
regular: &Option<Option<BranchNodeCompact>>,
database: &Option<BranchNodeCompact>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that task/regular/database is mostlikely coming from here, so perhaps @rkrasiuk needs to fill in the blanks

Comment on lines -15 to +14
/// Collection of removed intermediate account nodes indexed by full path.
#[cfg_attr(any(test, feature = "serde"), serde(with = "serde_nibbles_set"))]
pub removed_nodes: HashSet<Nibbles>,
pub changed_nodes: HashMap<Nibbles, Option<BranchNodeCompact>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also not immediately clear how removed translated to changed here

@kien-rise
Copy link
Contributor Author

kien-rise commented Jan 24, 2025

I am gonna convert this PR to Draft (to prevent any accidental merge) because #13976 is (potentially) a simpler version.

@kien-rise kien-rise marked this pull request as draft January 24, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants