-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: remove TrieUpdates::removed_nodes and StorageTrieUpdates::removed_nodes (attempt 2) #13929
base: main
Are you sure you want to change the base?
Conversation
9f5db66
to
d0bee8b
Compare
d0bee8b
to
0d0cc88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I almost understood this, but I'm unequipped to review this in detail.
ptal @rkrasiuk
imo if feasible we should try to get this in, because it makes sense why this is significantly more performant
account_nodes: HashMap<Nibbles, EntryDiff<Option<BranchNodeCompact>>>, | ||
removed_nodes: HashMap<Nibbles, EntryDiff<bool>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this refactor isn't straight forward to me, unclear how account/removed nodes translate to task/regular/database
I'd appreciate a few additional docs
.storage_nodes | ||
.keys() | ||
.chain(regular.storage_nodes.keys()) | ||
for key in Iterator::chain(task.changed_nodes.keys(), regular.changed_nodes.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the fully qualified syntax here
task: &Option<Option<BranchNodeCompact>>, | ||
regular: &Option<Option<BranchNodeCompact>>, | ||
database: &Option<BranchNodeCompact>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that task/regular/database is mostlikely coming from here, so perhaps @rkrasiuk needs to fill in the blanks
/// Collection of removed intermediate account nodes indexed by full path. | ||
#[cfg_attr(any(test, feature = "serde"), serde(with = "serde_nibbles_set"))] | ||
pub removed_nodes: HashSet<Nibbles>, | ||
pub changed_nodes: HashMap<Nibbles, Option<BranchNodeCompact>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not immediately clear how removed translated to changed here
I am gonna convert this PR to Draft (to prevent any accidental merge) because #13976 is (potentially) a simpler version. |
Replaces #13872. This PR is only slightly worse than #13872, but the code changes are minimal.
Checklist
benches/*
Motivation
Under high load (TPS > 50,000),
MemoryOverlayStateProviderRef::trie_state
is taking a considerable amount of time (over 200ms). One factor contributing to this is theTrieUpdates::extend_ref
function. Optimizing the struct definition ofTrieUpdates
could help improve the performance of theextend_ref
function.Solution
First, this PR changes the struct definitions of
TrieUpdates
andStorageTrieUpdates
:Next, this PR replaces the following steps in fn
TrieUpdates::extend_ref
:self.account_nodes.retain(|nibbles, _| !other.removed_nodes.contains(nibbles));
self.account_nodes.extend(exclude_empty_from_pair(other.account_nodes.iter().map(|(k, v)| (k.clone(), v.clone()))));
self.removed_nodes.extend(exclude_empty(other.removed_nodes.iter().cloned()));
by
self.account_nodes.extend(exclude_empty_from_pair(other.account_nodes.iter().map(|(k, v)| (k.clone(), v.clone()))));
Similar changes is applied to
StorageTrieUpdates::extend_ref
.Criterion Benchmarks
E2E Benchmarks
Before and After
Before
30,601.00 tps, 1,056,914,565.83 gps, no chain lag, 585111076/609686114 μs
54,594.73 tps, 1,146,512,812.78 gps, no chain lag, 323457659/332195794 μs
6,331.00 tps, 882,451,082.10 gps, no chain lag, 1254878698/1270738569 μs
After
30,601.00 tps, 1,056,921,850.84 gps, no chain lag, 523309918/561875948 μs
54,601.00 tps, 1,146,644,512.25 gps, no chain lag, 298599422 μs/315966032 μs
6,331.00 tps, 882,450,529.38 gps, no chain lag, 1151877682/1180946313 μs
Other benchmarks