Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forge] increase load of graceful overload #15159

Merged
merged 1 commit into from
Nov 5, 2024
Merged

[forge] increase load of graceful overload #15159

merged 1 commit into from
Nov 5, 2024

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Nov 1, 2024

Description

As TPS has improved, increase the TPS of the graceful overload test to more properly test overload. (The current "overload" could almost be reached at peak TPS.)

How Has This Been Tested?

Run the test, see TPS -- it now shows overload
https://github.com/aptos-labs/aptos-core/actions/runs/11636241940

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 1, 2024

⏱️ 1h 6m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
forge-compat-test / forge 24m 🟥🟩
test-target-determinator 9m 🟩🟩
rust-doc-tests 5m 🟩
execution-performance / test-target-determinator 5m 🟩
check 4m 🟩
check-dynamic-deps 4m 🟩🟩🟩🟩🟩
rust-cargo-deny 4m 🟩🟩
determine-test-metadata 2m 🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
fetch-last-released-docker-image-tag 2m 🟩
rust-move-tests 2m 🟩
rust-move-tests 2m 🟩
general-lints 56s 🟩🟩
file_change_determinator 33s 🟩🟩🟩
file_change_determinator 25s 🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

@bchocho bchocho added CICD:build-images when this label is present github actions will start build+push rust images from the PR. CICD:build-failpoints-images Build failpoints docker image CICD:build-performance-images build performance docker image variants labels Nov 1, 2024
@bchocho bchocho requested review from igor-aptos and hariria November 1, 2024 22:27
@bchocho bchocho marked this pull request as ready for review November 1, 2024 22:27
@@ -238,7 +238,7 @@ pub(crate) fn realistic_env_graceful_overload(duration: Duration) -> ForgeConfig
.with_initial_fullnode_count(20)
.add_network_test(wrap_with_realistic_env(num_validators, TwoTrafficsTest {
inner_traffic: EmitJobRequest::default()
.mode(EmitJobMode::ConstTps { tps: 15000 })
.mode(EmitJobMode::ConstTps { tps: 30000 })
Copy link
Contributor

@vusirikala vusirikala Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the TPS is increased, should we be updating the success criteria for the test as well?
Is the test already passing with the current success criteria?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was passing "too well" -- almost at 14K TPS. Now that we've made it overload, it is at 11K TPS. To avoid test failure noise, I'll change the success criteria after we see a few runs though

Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good days, good days :)

@bchocho bchocho enabled auto-merge (squash) November 4, 2024 19:21

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Nov 4, 2024

✅ Forge suite realistic_env_max_load success on af43a82eedb9a01e6999536863d2a3344127315c

two traffics test: inner traffic : committed: 14379.90 txn/s, latency: 2764.04 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3200 ms), latency samples: 5467720
two traffics test : committed: 99.97 txn/s, latency: 1553.41 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 9900 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.051, avg: 1.591", "ConsensusProposalToOrdered: max: 0.318, avg: 0.293", "ConsensusOrderedToCommit: max: 0.370, avg: 0.356", "ConsensusProposalToCommit: max: 0.660, avg: 0.650"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.89s no progress at version 1890287 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.56s no progress at version 1890285 (avg 8.17s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Nov 4, 2024

✅ Forge suite framework_upgrade success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c (PR)
Upgrade the nodes to version: af43a82eedb9a01e6999536863d2a3344127315c
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1232.46 txn/s, submitted: 1234.69 txn/s, failed submission: 2.23 txn/s, expired: 2.23 txn/s, latency: 2411.34 ms, (p50: 2100 ms, p70: 2400, p90: 3600 ms, p99: 5400 ms), latency samples: 110440
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1290.66 txn/s, submitted: 1292.66 txn/s, failed submission: 2.00 txn/s, expired: 2.00 txn/s, latency: 2310.71 ms, (p50: 2100 ms, p70: 2400, p90: 3300 ms, p99: 5000 ms), latency samples: 116160
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c passed
Upgrade the remaining nodes to version: af43a82eedb9a01e6999536863d2a3344127315c
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1163.64 txn/s, submitted: 1166.05 txn/s, failed submission: 2.41 txn/s, expired: 2.41 txn/s, latency: 2569.19 ms, (p50: 2100 ms, p70: 2700, p90: 4200 ms, p99: 7200 ms), latency samples: 106180
Test Ok

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Nov 5, 2024

✅ Forge suite compat success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c (PR)
1. Check liveness of validators at old version: 1086a5e00d773704731ab84fb4ed3538613b2250
compatibility::simple-validator-upgrade::liveness-check : committed: 13945.12 txn/s, latency: 2439.52 ms, (p50: 1900 ms, p70: 2100, p90: 4200 ms, p99: 9600 ms), latency samples: 520480
2. Upgrading first Validator to new version: af43a82eedb9a01e6999536863d2a3344127315c
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6382.27 txn/s, latency: 4511.82 ms, (p50: 5200 ms, p70: 5500, p90: 5600 ms, p99: 5700 ms), latency samples: 114120
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5851.21 txn/s, latency: 5106.50 ms, (p50: 5600 ms, p70: 5700, p90: 5800 ms, p99: 6000 ms), latency samples: 217180
3. Upgrading rest of first batch to new version: af43a82eedb9a01e6999536863d2a3344127315c
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6801.53 txn/s, latency: 4207.22 ms, (p50: 4900 ms, p70: 5100, p90: 5200 ms, p99: 5400 ms), latency samples: 119600
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6360.27 txn/s, latency: 5162.40 ms, (p50: 5200 ms, p70: 5300, p90: 6800 ms, p99: 7100 ms), latency samples: 225780
4. upgrading second batch to new version: af43a82eedb9a01e6999536863d2a3344127315c
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 7805.05 txn/s, latency: 3520.29 ms, (p50: 3400 ms, p70: 4100, p90: 5400 ms, p99: 5800 ms), latency samples: 141880
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 7789.64 txn/s, latency: 4094.66 ms, (p50: 3800 ms, p70: 4600, p90: 6100 ms, p99: 6600 ms), latency samples: 259880
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> af43a82eedb9a01e6999536863d2a3344127315c passed
Test Ok

@bchocho bchocho merged commit 610bb72 into main Nov 5, 2024
154 of 169 checks passed
@bchocho bchocho deleted the brian/graceful branch November 5, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-failpoints-images Build failpoints docker image CICD:build-images when this label is present github actions will start build+push rust images from the PR. CICD:build-performance-images build performance docker image variants
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants