Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Indexer-Grpc-V2] Add MetadataManager. #15727

Merged
merged 1 commit into from
Jan 18, 2025
Merged

[Indexer-Grpc-V2] Add MetadataManager. #15727

merged 1 commit into from
Jan 18, 2025

Conversation

grao1991
Copy link
Contributor

@grao1991 grao1991 commented Jan 14, 2025

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jan 14, 2025

⏱️ 3h 42m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 1h 4m 🟥🟥🟩🟩
check-dynamic-deps 35m 🟩🟩🟩🟩🟩 (+8 more)
rust-cargo-deny 22m 🟩🟩🟩🟩🟩 (+8 more)
test-target-determinator 16m 🟩🟩🟩
execution-performance / test-target-determinator 16m 🟩🟩🟩🟩
forge-compat-test / forge 15m 🟩
forge-e2e-test / forge 13m 🟩
fetch-last-released-docker-image-tag 7m 🟩🟩🟩🟩
general-lints 6m 🟩🟩🟩🟩🟩 (+8 more)
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
semgrep/ci 5m 🟩🟩🟩🟩🟩 (+8 more)
rust-doc-tests 3m
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+8 more)

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 25m 14m +74%
test-target-determinator 4m 5m -23%

settingsfeedbackdocs ⋅ learn more about trunk.io

@grao1991 grao1991 force-pushed the grao_metadata_manager branch from f66cc55 to bff3fa1 Compare January 14, 2025 02:41
@grao1991 grao1991 requested a review from larry-aptos January 14, 2025 02:43
@grao1991 grao1991 marked this pull request as ready for review January 14, 2025 02:43
@grao1991 grao1991 force-pushed the grao_metadata_manager branch from bff3fa1 to 501f87e Compare January 14, 2025 02:44
}

// TODO(grao): This is a magic number, consider a different algorithm here.
let capacity = std::cmp::max(candidates.iter().map(|c| c.1).max().unwrap() + 2, 20);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unwrap() here could panic if all candidates have 0 active streams. Consider using unwrap_or(0) to safely handle the empty or all-zero case:

let capacity = std::cmp::max(candidates.iter().map(|c| c.1).max().unwrap_or(0) + 2, 20);

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.


tokio_scoped::scope(|s| {
s.spawn(async move {
self.metadata_manager.start().await.unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unwrap() on the async task could cause a panic if the metadata manager fails. Consider propagating the error instead with self.metadata_manager.start().await?

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@grao1991 grao1991 force-pushed the grao_metadata_manager branch 3 times, most recently from 79c4698 to 43ecd36 Compare January 16, 2025 02:38
@grao1991 grao1991 force-pushed the grao_metadata_manager branch 3 times, most recently from 8c77fc8 to 20819b0 Compare January 16, 2025 06:47
@grao1991 grao1991 enabled auto-merge (squash) January 16, 2025 18:28

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

}
});

tokio::time::sleep(Duration::from_secs(1)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to ping so often? Can we sleep longer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to make it more frequent since it's cheap. Under load a lot of things can happen in 1s.

.entry(address.clone())
.or_insert(LiveDataService::new(address));
entry.value_mut().recent_states.push_back(info);
if entry.value().recent_states.len() > MAX_NUM_OF_STATES_TO_KEEP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to keep more than 1 state history?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sample use case is I can calculate the tps. (especially for debugging purpose, this can have much more information and granularity than grafana metric)

use tonic::transport::channel::Channel;
use tracing::trace;

const MAX_NUM_OF_STATES_TO_KEEP: usize = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments here would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -20,6 +20,9 @@ pub(crate) struct ServiceConfig {
pub struct IndexerGrpcManagerConfig {
pub(crate) chain_id: u64,
pub(crate) service_config: ServiceConfig,
pub(crate) self_advertised_address: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: either give it a type other than string or use socketaddr/url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a type.

for kv in &self.fullnodes {
let (address, fullnode) = kv.pair();
let need_ping = fullnode.recent_states.back().map_or(true, |s| {
Self::need_ping(s.timestamp.unwrap_or_default(), Duration::from_secs(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: config these time/duration; we may need to experiment these number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid putting too many things into config, which confuse people. Let's keep it here now and only move it to config if we find it really necessary.

Copy link
Contributor

@larry-aptos larry-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve to unblock; going to take another round of review

@grao1991 grao1991 force-pushed the grao_metadata_manager branch 2 times, most recently from ac527bd to 66619b8 Compare January 17, 2025 19:22
@grao1991 grao1991 requested a review from sitalkedia January 17, 2025 19:22

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@grao1991 grao1991 force-pushed the grao_metadata_manager branch from 66619b8 to 05e56a8 Compare January 18, 2025 01:32

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d

two traffics test: inner traffic : committed: 14473.51 txn/s, latency: 2737.00 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5503140
two traffics test : committed: 99.97 txn/s, latency: 1480.85 ms, (p50: 1400 ms, p70: 1500, p90: 1600 ms, p99: 1900 ms), latency samples: 1800
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.548, avg: 1.480", "ConsensusProposalToOrdered: max: 0.295, avg: 0.291", "ConsensusOrderedToCommit: max: 0.411, avg: 0.397", "ConsensusProposalToCommit: max: 0.701, avg: 0.688"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.62s no progress at version 23196 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.63s no progress at version 2387390 (avg 0.63s) [limit 16].
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 17540fad8e88ab5681f3a91190b9f5d37e53d2ef ==> 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d

Compatibility test results for 17540fad8e88ab5681f3a91190b9f5d37e53d2ef ==> 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d (PR)
1. Check liveness of validators at old version: 17540fad8e88ab5681f3a91190b9f5d37e53d2ef
compatibility::simple-validator-upgrade::liveness-check : committed: 16325.86 txn/s, latency: 2016.48 ms, (p50: 2100 ms, p70: 2200, p90: 2300 ms, p99: 2500 ms), latency samples: 523680
2. Upgrading first Validator to new version: 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 3469.28 txn/s, latency: 8386.31 ms, (p50: 8600 ms, p70: 11700, p90: 12100 ms, p99: 12100 ms), latency samples: 77500
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4680.89 txn/s, latency: 7002.98 ms, (p50: 7600 ms, p70: 8200, p90: 8400 ms, p99: 8400 ms), latency samples: 168480
3. Upgrading rest of first batch to new version: 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 1960.45 txn/s, latency: 13340.73 ms, (p50: 15100 ms, p70: 18500, p90: 19900 ms, p99: 20400 ms), latency samples: 56500
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4354.67 txn/s, latency: 7798.76 ms, (p50: 8700 ms, p70: 8900, p90: 9300 ms, p99: 9500 ms), latency samples: 152880
4. upgrading second batch to new version: 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 8250.71 txn/s, latency: 3713.96 ms, (p50: 4300 ms, p70: 4400, p90: 4600 ms, p99: 4700 ms), latency samples: 151860
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 7939.89 txn/s, latency: 4293.33 ms, (p50: 4500 ms, p70: 4700, p90: 5000 ms, p99: 5200 ms), latency samples: 267660
5. check swarm health
Compatibility test for 17540fad8e88ab5681f3a91190b9f5d37e53d2ef ==> 05e56a863d9bd7e0d0bf4e791c27cd451a40c08d passed
Test Ok

@grao1991 grao1991 merged commit 824511d into main Jan 18, 2025
43 of 46 checks passed
@grao1991 grao1991 deleted the grao_metadata_manager branch January 18, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants