Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dag] cache votes without re-counting #11808

Merged
merged 5 commits into from
Feb 9, 2024
Merged

Conversation

ibalajiarun
Copy link
Contributor

@ibalajiarun ibalajiarun commented Jan 28, 2024

Description

Instead of counting votes each time, this PR adds caching functionality for votes.

Also, checks for epoch when handling a RB node.

Test Plan

Fixed existing tests.

Copy link

trunk-io bot commented Jan 28, 2024

⏱️ 3h 25m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 1h 15m 🟩🟩
windows-build 1h 9m 🟩🟩🟩
rust-smoke-tests 29m 🟩
rust-lints 11m 🟥🟥
run-tests-main-branch 7m 🟥🟥
check-dynamic-deps 6m 🟩🟩🟩
general-lints 5m 🟩🟩
semgrep/ci 53s 🟩🟩🟩
check 38s 🟥🟥
file_change_determinator 21s 🟩🟩
file_change_determinator 17s 🟩🟩
permission-check 9s 🟩🟩🟩
permission-check 8s 🟩🟩🟩
permission-check 5s 🟩🟩
permission-check 5s 🟩🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
windows-build 26m 18m +42%

settingsfeedbackdocs ⋅ learn more about trunk.io

}

for parent in node.parents_metadata() {
let voting_power = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this outside, no?

.expect("must exist");
match node_status {
Some(NodeStatus::Unordered {
votes_culumative_voting_power,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it weak_vote and strong_vote correspondingly

.expect("must exist");
let node_status = self
.get_node_ref_mut(parent.round(), parent.author())
.expect("must exist");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add another expect here instead of unreachable below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a mutable reference to node_status. If i expect, then I wont get a reference to the Option in BTreeMap.

_ = tx.take().expect("must exist").send(certificate);
}

if partial_signatures.signatures().len() == self.epoch_state.verifier.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments here to explain why the aggregation is through a channel? I assume it's because we want to keep the rb alive even after cert?

@ibalajiarun ibalajiarun force-pushed the balaji/dag-count-votes-opt branch from a184dfa to ae6a329 Compare February 8, 2024 23:31
Base automatically changed from balaji/extended-rb to main February 9, 2024 06:41
@ibalajiarun ibalajiarun force-pushed the balaji/dag-count-votes-opt branch from ae6a329 to 9a52fb1 Compare February 9, 2024 15:28
@ibalajiarun ibalajiarun enabled auto-merge (squash) February 9, 2024 15:46

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Feb 9, 2024

✅ Forge suite compat success on aptos-node-v1.8.3 ==> e09ec1fbf02cfa539b4c274104cf122f4e77eb23

Compatibility test results for aptos-node-v1.8.3 ==> e09ec1fbf02cfa539b4c274104cf122f4e77eb23 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4737 txn/s, latency: 6749 ms, (p50: 6600 ms, p90: 9600 ms, p99: 17200 ms), latency samples: 180040
2. Upgrading first Validator to new version: e09ec1fbf02cfa539b4c274104cf122f4e77eb23
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1711 txn/s, latency: 16558 ms, (p50: 19200 ms, p90: 25000 ms, p99: 26200 ms), latency samples: 92440
3. Upgrading rest of first batch to new version: e09ec1fbf02cfa539b4c274104cf122f4e77eb23
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1839 txn/s, latency: 15653 ms, (p50: 19600 ms, p90: 21900 ms, p99: 22300 ms), latency samples: 91980
4. upgrading second batch to new version: e09ec1fbf02cfa539b4c274104cf122f4e77eb23
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3170 txn/s, latency: 9487 ms, (p50: 9900 ms, p90: 12200 ms, p99: 12900 ms), latency samples: 136340
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> e09ec1fbf02cfa539b4c274104cf122f4e77eb23 passed
Test Ok

Copy link
Contributor

github-actions bot commented Feb 9, 2024

✅ Forge suite realistic_env_max_load success on e09ec1fbf02cfa539b4c274104cf122f4e77eb23

two traffics test: inner traffic : committed: 7435 txn/s, latency: 5112 ms, (p50: 4600 ms, p90: 6400 ms, p99: 13300 ms), latency samples: 3212220
two traffics test : committed: 100 txn/s, latency: 2452 ms, (p50: 2100 ms, p90: 2500 ms, p99: 14500 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.274, avg: 0.210", "QsPosToProposal: max: 0.199, avg: 0.169", "ConsensusProposalToOrdered: max: 0.596, avg: 0.562", "ConsensusOrderedToCommit: max: 0.500, avg: 0.469", "ConsensusProposalToCommit: max: 1.069, avg: 1.031"]
Max round gap was 1 [limit 4] at version 1544694. Max no progress secs was 9.913752 [limit 15] at version 1544694.
Test Ok

@ibalajiarun ibalajiarun merged commit a83b5f7 into main Feb 9, 2024
77 of 80 checks passed
@ibalajiarun ibalajiarun deleted the balaji/dag-count-votes-opt branch February 9, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants