-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dag] refactor/improve dag health monitoring #11362
Conversation
⏱️ 10h total CI duration on this PR
🚨 1 job on the last run was significantly faster/slower than expected
|
cfd24a5
to
e653a70
Compare
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
f394e70
to
b443006
Compare
consensus/src/dag/bootstrap.rs
Outdated
Arc<dyn AnchorElection>, | ||
Option<Arc<LeaderReputationAdapter>>, | ||
Option<Vec<CommitEvent>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possibly ugly. I couldn't cleanly separate them in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should always populate the commit events (for the sake of completeness)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it look better, we can create another trait to get voting power ratio and having a no-op one always returns 1 so here can return Arc<dyn AnchorElection>, Arc<dyn CommitHistory>, Vec<CommitEvent>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that we can populate always. How many events do we read? For AnchorElectionMode::LeaderReputation
we have a config with num events to read, but not for RoundRobin Leader. Also, it seems unnecessary to read in the latter case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, introduced the CommitHistory trait.
fec87c9
to
ce5ed52
Compare
b443006
to
e9bcd97
Compare
89b8b17
to
82f79bd
Compare
e9bcd97
to
697fbe6
Compare
697fbe6
to
b0ba9d6
Compare
b0ba9d6
to
1a912a9
Compare
consensus/src/dag/bootstrap.rs
Outdated
Arc<dyn AnchorElection>, | ||
Option<Arc<LeaderReputationAdapter>>, | ||
Option<Vec<CommitEvent>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should always populate the commit events (for the sake of completeness)
consensus/src/dag/bootstrap.rs
Outdated
Arc<dyn AnchorElection>, | ||
Option<Arc<LeaderReputationAdapter>>, | ||
Option<Vec<CommitEvent>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it look better, we can create another trait to get voting power ratio and having a no-op one always returns 1 so here can return Arc<dyn AnchorElection>, Arc<dyn CommitHistory>, Vec<CommitEvent>
consensus/src/dag/health/backoff.rs
Outdated
.unwrap_or((u64::MAX, u64::MAX)); | ||
let voting_power_ratio = self.chain_health.voting_power_ratio(round); | ||
|
||
let max_txns_per_round = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's probably better to use []
than vec![]
here
ordered_notifier.clone(), | ||
); | ||
let health_backoff = | ||
HealthBackoff::new(self.epoch_state.clone(), chain_health, pipeline_health); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of these actually stops the progress fully right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, next PR stops voting.
1a912a9
to
022dce0
Compare
consensus/src/dag/round_state.rs
Outdated
}; | ||
|
||
let wait_time = self.minimal_wait_time.max(minimum_delay); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 163:
I think even if voting power == 3f+1 we need to wait for minimum_delay.
Currently this is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixed it.
022dce0
to
d93c02a
Compare
c6a0cbf
to
694d0e5
Compare
consensus/src/dag/adapter.rs
Outdated
@@ -89,6 +93,7 @@ pub(super) struct OrderedNotifierAdapter { | |||
parent_block_info: Arc<RwLock<BlockInfo>>, | |||
epoch_state: Arc<EpochState>, | |||
ledger_info_provider: Arc<RwLock<LedgerInfoProvider>>, | |||
block_created_ts: Arc<RwLock<BTreeMap<Round, Instant>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is confusing, I think you use block ordered time instead of creation time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block is created here for the first time, so I named it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh because dag doesn't have block, but still it's easy to confuse with jolteon block creation, I'd just call it block_ordered_ts
) -> ( | ||
Arc<dyn AnchorElection>, | ||
Arc<dyn CommitHistory>, | ||
Option<Vec<CommitEvent>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roughly remember we discussed this, the abstraction here looks weird. I'd imagine we have a concrete CommitHistory
struct that can return CommitEvent, and AnchorElection can use the CommitHistory (impl MetadataBackend) internally for reputation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, will follow-up
.min() | ||
.expect("must not be empty"); | ||
|
||
let max_txns = max_txns_per_round.saturating_div( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, do we verify payload limit on receivers today? this seems making it impossible to verify because of different views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a TODO
52d77cf
to
901b273
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
Test Plan