Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dag] concurrent dag store #11505

Merged
merged 4 commits into from
Feb 9, 2024
Merged

[dag] concurrent dag store #11505

merged 4 commits into from
Feb 9, 2024

Conversation

ibalajiarun
Copy link
Contributor

@ibalajiarun ibalajiarun commented Dec 22, 2023

Description

  • Separated in-memory component of Dag store from persistence.
  • PersistentDagStore handles both persistence and maintains the in-memory DAG store.

Test Plan

Copy link

trunk-io bot commented Dec 22, 2023

⏱️ 8h 12m total CI duration on this PR
Job Cumulative Duration Recent Runs
windows-build 3h 55m 🟥🟩🟩🟩🟩 (+6 more)
rust-unit-tests 2h 33m 🟥🟩🟥🟩🟩 (+1 more)
rust-lints 30m 🟥🟥🟥🟥🟥 (+2 more)
run-tests-main-branch 26m 🟩🟩🟩🟩 (+2 more)
check-dynamic-deps 22m 🟩🟩🟩🟩🟩 (+7 more)
general-lints 16m 🟩🟩🟩🟩🟩 (+1 more)
semgrep/ci 4m 🟩🟩🟩🟩🟩 (+8 more)
check 2m 🟥🟥🟥🟥 (+3 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 1m 🟩🟩🟩 (+3 more)
permission-check 28s 🟩🟩🟩🟩🟩 (+6 more)
permission-check 25s 🟩🟩🟩🟩🟩 (+6 more)
permission-check 22s 🟩🟩🟩🟩 (+3 more)
permission-check 15s 🟩🟩🟩 (+3 more)

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
windows-build 30m 18m +63%

settingsfeedbackdocs ⋅ learn more about trunk.io

@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch from 4d63114 to 97d4ab8 Compare December 22, 2023 23:52
@ibalajiarun ibalajiarun marked this pull request as ready for review December 22, 2023 23:53
@ibalajiarun ibalajiarun force-pushed the balaji/vote-backpressure branch from 78bf7e0 to 2140a13 Compare January 8, 2024 23:39
@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch from d314c8e to 74562ed Compare January 8, 2024 23:39
}
}

pub fn write(&self) -> DagWriteGuard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant

@ibalajiarun ibalajiarun force-pushed the balaji/vote-backpressure branch from 2140a13 to 39e4b84 Compare January 10, 2024 22:23
@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch from a215a13 to 7f86eb9 Compare January 10, 2024 23:40
@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch 2 times, most recently from 60bacf0 to 78ffa8c Compare January 23, 2024 19:07
@ibalajiarun ibalajiarun requested a review from sasha8 January 23, 2024 19:07
@ibalajiarun ibalajiarun force-pushed the balaji/vote-backpressure branch from 39e4b84 to 9480471 Compare January 25, 2024 21:59
@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch from a6e6ee3 to 0d87fcc Compare January 25, 2024 21:59
@@ -46,7 +44,7 @@ use tokio_retry::strategy::ExponentialBackoff;
pub(crate) struct DagDriver {
author: Author,
epoch_state: Arc<EpochState>,
dag: Arc<RwLock<Dag>>,
dag: Arc<PersistentDagStore>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dag -> persistenet_dag to avoid confusion with the inner dag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
You could take
let round_ref = self
.nodes_by_round
.entry(round)
.or_insert_with(|| vec![None; self.author_to_index.len()]);
out of validate and lock only that.

@sasha8
Copy link
Contributor

sasha8 commented Jan 27, 2024

Much cleaner now!

Ok(())
}

fn validate_new_node(&mut self, node: &CertifiedNode) -> anyhow::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it looks a bit weird to have &mut self for validation, but probably it makes insertion simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it makes insertion and also last validation step simpler.

@@ -408,3 +372,114 @@ impl Dag {
self.nodes_by_round.is_empty() && self.start_round > 1
}
}

pub struct PersistentDagStore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this DagStore and Dag -> InMemDag

let node = Arc::new(node);
let round_ref = self
.get_node_ref_mut(node.round(), node.author())
.expect("must be present");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is not true if pruning happens in between?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, found this recently as well. we need a ensure! here for the round here. Also, need to fix the lowest_round() and highest_round() implementations.

Base automatically changed from balaji/vote-backpressure to main February 6, 2024 06:54
@ibalajiarun ibalajiarun force-pushed the balaji/concurrent-dag-store branch from 0d87fcc to e769867 Compare February 6, 2024 19:31
@ibalajiarun ibalajiarun requested a review from zekun000 February 6, 2024 19:31
@ibalajiarun ibalajiarun enabled auto-merge (squash) February 8, 2024 23:43

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Feb 9, 2024

✅ Forge suite compat success on aptos-node-v1.8.3 ==> e769867995ebf107fee3ee8026361a221f00f581

Compatibility test results for aptos-node-v1.8.3 ==> e769867995ebf107fee3ee8026361a221f00f581 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4140 txn/s, latency: 6798 ms, (p50: 5600 ms, p90: 8700 ms, p99: 22100 ms), latency samples: 182160
2. Upgrading first Validator to new version: e769867995ebf107fee3ee8026361a221f00f581
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1781 txn/s, latency: 16052 ms, (p50: 19300 ms, p90: 22200 ms, p99: 22500 ms), latency samples: 92620
3. Upgrading rest of first batch to new version: e769867995ebf107fee3ee8026361a221f00f581
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1837 txn/s, latency: 15666 ms, (p50: 19100 ms, p90: 21900 ms, p99: 22300 ms), latency samples: 91860
4. upgrading second batch to new version: e769867995ebf107fee3ee8026361a221f00f581
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3499 txn/s, latency: 9016 ms, (p50: 9800 ms, p90: 13500 ms, p99: 13900 ms), latency samples: 139960
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> e769867995ebf107fee3ee8026361a221f00f581 passed
Test Ok

Copy link
Contributor

github-actions bot commented Feb 9, 2024

✅ Forge suite realistic_env_max_load success on e769867995ebf107fee3ee8026361a221f00f581

two traffics test: inner traffic : committed: 7302 txn/s, latency: 5237 ms, (p50: 4800 ms, p90: 6900 ms, p99: 12900 ms), latency samples: 3161980
two traffics test : committed: 100 txn/s, latency: 2176 ms, (p50: 2100 ms, p90: 2500 ms, p99: 4800 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.227, avg: 0.203", "QsPosToProposal: max: 0.149, avg: 0.138", "ConsensusProposalToOrdered: max: 0.588, avg: 0.544", "ConsensusOrderedToCommit: max: 0.497, avg: 0.471", "ConsensusProposalToCommit: max: 1.052, avg: 1.015"]
Max round gap was 1 [limit 4] at version 1500894. Max no progress secs was 4.985149 [limit 15] at version 1500894.
Test Ok

@ibalajiarun ibalajiarun merged commit 3163686 into main Feb 9, 2024
97 of 118 checks passed
@ibalajiarun ibalajiarun deleted the balaji/concurrent-dag-store branch February 9, 2024 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants