-
Notifications
You must be signed in to change notification settings - Fork 3.7k
290 lines (267 loc) · 17.3 KB
/
forge-stable.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
# Continuously run stable forge tests against the latest main branch.
name: Continuous Forge Tests - Stable
# We have various Forge Stable tests here, that test out different situations and workloads.
#
# Dashboard showing historical results: https://grafana.aptoslabs.com/d/bdnt45ggsg000f/forge-stable-performance?orgId=1
# Tests are named based on how they are set up, some of the common flavors are:
# * "realistic-env" - tests with "realistic-env" in their name try to have network and hardware environemnt
# be more realisistic. They use "wrap_with_realistic_env", which sets:
# * MultiRegionNetworkEmulationTest which splits nodes into 4 "regions", which have different
# x-region and in-region latencies and reliability rates
# * CpuChaosTest which tries to make nodes have heterogenous hardware, by loading a few cores fully
# on a few nodes. But this is not too helpful, as block execution time variance is minimal
# (as we generally have a few idle cores, and real variance mostly comes from variance in cpu speed instead)
# * sweep - means running a multiple tests within a single test, by having everything the same, except for one
# thing - i.e. the thing we sweep over. There are two main dimensions we "sweep" over:
# * load sweep - this generally uses const tps workload, and varies the load across the tests (i.e. 10 vs 100 vs 1000 TPS)
# * workload sweep - this varies the transaction type being submitted, trying to test out how the system behaves
# when different part of the system are stressed (i.e. low vs high output sizes, good vs bad gas calibration, parallel vs sequential, etc)
# * graceful - tests where we are overloading the system - i.e. submitting more transactions than we expect system to handle,
# and seeing how it behaves. overall e2e latency is then high, but we can test that only validator -> block proposal has increased.
# additionally, we generally add a small TPS high-fee traffic in these tests, to confirm it is unaffected by the high load.
# * changing-working-quorum - tests where we intentionally make nodes unreachable (cut their network), and bring them back,
# and go to cut network on next set of nodes - requiring state-sync to catch up, consensus to work with different set of
# nodes being required to form consensus. During each iteration, we test that enough progress was made.
#
# Main success criteria used across the tests are:
# * throughput and expiration/rejection rate
# * latency (avg / p50 / p90 / p99 )
# * latency breakdown across the components - currently within a validator alone:
# batch->pos->proposal->ordered->committed
# TODO: we should add other stages - before batch and after committed to success criteria, to be able to track them better
# * chain progress - checking longest pause between the blocks (both in num failed runs, and in absolute time) across the run
# * catchup - that all nodes can go over the same version at the end of the load (i.e. catching if any individual validator got stuck)
# * no restarts - fails if any node has restarted within the test
# * system metrics - checks for CPU and RAM utilization during the test
# TODO: we should add network bandwidth and disk iops as system metrics checks as well
#
# You can find more details about the individual tests below.
permissions:
issues: write
pull-requests: write
contents: read
id-token: write
actions: write #required for workflow cancellation via check-aptos-core
on:
# Allow triggering manually
workflow_dispatch:
inputs:
IMAGE_TAG:
required: false
type: string
description: The docker image tag to test. This may be a git SHA1, or a tag like "<branch>_<git SHA1>". If not specified, Forge will find the latest build based on the git history (starting from GIT_SHA input)
GIT_SHA:
required: false
type: string
description: The git SHA1 to checkout. This affects the Forge test runner that is used. If not specified, the latest main will be used
TEST_NAME:
required: true
type: choice
description: The specific stable test to run. If 'all', all stable tests will be run
default: 'all'
options:
- all
- framework-upgrade-test
# Test varies the load, i.e. sending 10, 100, 1000, 5000, etc TPS, and then measuring
# onchain TPS, expired rate, as well as p50/p90/p99 latency, among other things,
# testing that we don't degrade performance both for low, mid and high loads.
- realistic-env-load-sweep
# Test varies the workload, across some basic workloads (i.e. some cheap, some expensive),
# and checks that throughput and performance across different stages
- realistic-env-workload-sweep
# Test sends ConstTps workload above what the system can handle, while additionally sending
# non-small high-fee traffic (1000 TPS), and measures overall system performance.
- realistic-env-graceful-overload
# Test varies the workload (opposite ends of gas calibration, high and low output sizes,
# sequential / parallel, etc), and sends ConstTPS for each above what the system can handle,
# while sending low TPS of high fee transactions. And primarily confirms that high-fee traffic
# has predictably low latency, and execution pipeline doesn't get backed up.
- realistic-env-graceful-workload-sweep
# Test varies workload, which is user-contracts, such that max throughput varies from high to mid to low,
# while testing that unrelated transactions paying the same gas price, are able to go through.
- realistic-env-fairness-workload-sweep
# Test which tunes all configurations for largest throughput possible (potentially sacrificing latency a bit)
- realistic-network-tuned-for-throughput
# Run small-ish load, but checks that at all times all nodes are making progress,
# catching any unexpected unreliabilities/delays in consensus
- consensus-stress-test
# Send a mix of different workloads, to catch issues with different interactions of workloads.
# this is MUCH MUCH less comprehensive than replay-verify, but the best we can do with txn emitter at the moment
- workload-mix-test
- single-vfn-perf
- fullnode-reboot-stress-test
- compat
# Send low TPS (100 TPS)
# Cut network on enough nodes, such that all others are needed for consensus. Then bring a few back, and cut
# same amount of new ones - requiring all that were brought back to state-sync and continue executing.
# Check that in each iteration - we were able to make meaningful progress.
- changing-working-quorum-test
# Same as above run-forge-changing-working-quorum-test, just sending a bit higher load - 500TPS
# TODO - we should probably increase load here significantly
- changing-working-quorum-test-high-load
- pfn-const-tps-realistic-env
# Run a production config (same as land blocking run) max load (via mempool backlog), but run it for 2 hours,
# to check reliability and consistency of the newtork.
- realistic-env-max-load-long
JOB_PARALLELISM:
required: false
type: number
description: The number of test jobs to run in parallel. If not specified, defaults to 1
default: 1
schedule:
- cron: "0 22 * * 1-5" # The main branch cadence. This runs every Mon-Fri
pull_request:
paths:
- ".github/workflows/forge-stable.yaml"
- "testsuite/find_latest_image.py"
concurrency:
group: forge-stable-${{ format('{0}-{1}-{2}-{3}', github.ref_name, inputs.GIT_SHA, inputs.IMAGE_TAG, inputs.TEST_NAME) }}
cancel-in-progress: true
env:
AWS_ACCOUNT_NUM: ${{ secrets.ENV_ECR_AWS_ACCOUNT_NUM }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
IMAGE_TAG: ${{ inputs.IMAGE_TAG }} # this is only used for workflow_dispatch, otherwise defaults to empty
AWS_REGION: us-west-2
jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
imageVariants: ${{ steps.set-matrix.outputs.imageVariants }}
steps:
- name: Compute matrix
id: set-matrix
uses: actions/github-script@v7
env:
TEST_NAME: ${{ inputs.TEST_NAME }}
with:
result-encoding: string
script: |
const testName = process.env.TEST_NAME || 'all';
console.log(`Running job: ${testName}`);
const tests = [
{ TEST_NAME: 'framework-upgrade-test', FORGE_RUNNER_DURATION_SECS: 7200, FORGE_TEST_SUITE: 'framework_upgrade' },
{ TEST_NAME: 'realistic-env-load-sweep', FORGE_RUNNER_DURATION_SECS: 1800, FORGE_TEST_SUITE: 'realistic_env_load_sweep' },
{ TEST_NAME: 'realistic-env-workload-sweep', FORGE_RUNNER_DURATION_SECS: 2000, FORGE_TEST_SUITE: 'realistic_env_workload_sweep' },
{ TEST_NAME: 'realistic-env-graceful-overload', FORGE_RUNNER_DURATION_SECS: 1200, FORGE_TEST_SUITE: 'realistic_env_graceful_overload' },
{ TEST_NAME: 'realistic-env-graceful-workload-sweep', FORGE_RUNNER_DURATION_SECS: 2100, FORGE_TEST_SUITE: 'realistic_env_graceful_workload_sweep' },
{ TEST_NAME: 'realistic-env-fairness-workload-sweep', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'realistic_env_fairness_workload_sweep' },
{ TEST_NAME: 'realistic-network-tuned-for-throughput', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'realistic_network_tuned_for_throughput', FORGE_ENABLE_PERFORMANCE: true },
{ TEST_NAME: 'consensus-stress-test', FORGE_RUNNER_DURATION_SECS: 2400, FORGE_TEST_SUITE: 'consensus_stress_test' },
{ TEST_NAME: 'workload-mix-test', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'workload_mix' },
{ TEST_NAME: 'single-vfn-perf', FORGE_RUNNER_DURATION_SECS: 480, FORGE_TEST_SUITE: 'single_vfn_perf' },
{ TEST_NAME: 'fullnode-reboot-stress-test', FORGE_RUNNER_DURATION_SECS: 1800, FORGE_TEST_SUITE: 'fullnode_reboot_stress_test' },
{ TEST_NAME: 'compat', FORGE_RUNNER_DURATION_SECS: 300, FORGE_TEST_SUITE: 'compat' },
{ TEST_NAME: 'changing-working-quorum-test', FORGE_RUNNER_DURATION_SECS: 1200, FORGE_TEST_SUITE: 'changing_working_quorum_test', FORGE_ENABLE_FAILPOINTS: true },
{ TEST_NAME: 'changing-working-quorum-test-high-load', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'changing_working_quorum_test_high_load', FORGE_ENABLE_FAILPOINTS: true },
{ TEST_NAME: 'pfn-const-tps-realistic-env', FORGE_RUNNER_DURATION_SECS: 900, FORGE_TEST_SUITE: 'pfn_const_tps_with_realistic_env' },
{ TEST_NAME: 'realistic-env-max-load-long', FORGE_RUNNER_DURATION_SECS: 7200, FORGE_TEST_SUITE: 'realistic_env_max_load_large' }
];
const matrix = testName != "all" ? tests.filter(test => test.TEST_NAME === testName) : tests;
core.debug(`Matrix: ${JSON.stringify(matrix)}`);
core.summary.addHeading('Forge Stable Run');
const testsToRunNames = matrix.map(test => `${test.TEST_NAME} (Needs Failpoint: ${test.FORGE_ENABLE_FAILPOINTS || false}, Needs Performance: ${test.FORGE_ENABLE_PERFORMANCE || false})`);
core.summary.addRaw("The following tests will be run:", true);
core.summary.addList(testsToRunNames);
core.summary.write();
const needsFailpoints = matrix.some(test => test.FORGE_ENABLE_FAILPOINTS);
const needsPerformance = matrix.some(test => test.FORGE_ENABLE_PERFORMANCE);
let requiredImageVariants = [];
if (needsFailpoints) {
requiredImageVariants.push('failpoints');
}
if (needsPerformance) {
requiredImageVariants.push('performance');
}
core.setOutput('matrix', JSON.stringify({ include: matrix }));
core.setOutput('imageVariants', requiredImageVariants.join(' '));
# This job determines the image tag and branch to test, and passes them to the other jobs
# NOTE: this may be better as a separate workflow as the logic is quite complex but generalizable
determine-test-metadata:
runs-on: ubuntu-latest
needs: ["generate-matrix"]
outputs:
IMAGE_TAG: ${{ steps.get-docker-image-tag.outputs.IMAGE_TAG }}
IMAGE_TAG_FOR_COMPAT_TEST: ${{ steps.get-last-released-image-tag-for-compat-test.outputs.IMAGE_TAG }}
BRANCH: ${{ steps.determine-test-branch.outputs.BRANCH }}
BRANCH_HASH: ${{ steps.hash-branch.outputs.BRANCH_HASH }}
steps:
- uses: actions/checkout@v4
- name: Determine branch based on cadence
id: determine-test-branch
# NOTE: the schedule cron MUST match the one in the 'on.schedule.cron' section above
run: |
BRANCH=""
if [[ "${{ github.event_name }}" == "schedule" ]]; then
echo "BRANCH=main" >> $GITHUB_OUTPUT
elif [[ "${{ github.event_name }}" == "push" ]]; then
echo "BRANCH=${{ github.ref_name }}" >> $GITHUB_OUTPUT
# on workflow_dispatch, this will simply use the inputs.GIT_SHA given (or the default)
elif [[ -n "${{ inputs.GIT_SHA }}" ]]; then
echo "BRANCH=${{ inputs.GIT_SHA }}" >> $GITHUB_OUTPUT
# if GIT_SHA not provided, use the branch where workflow runs on
else
echo "BRANCH=${{ github.head_ref }}" >> $GITHUB_OUTPUT
fi
echo "Branch: $(grep BRANCH= $GITHUB_OUTPUT)"
# Use the branch hash instead of the full branch name to stay under kubernetes namespace length limit
- name: Hash the branch
id: hash-branch
run: |
# Hashing the branch name
echo "BRANCH_HASH=$(echo -n "${{ steps.determine-test-branch.outputs.BRANCH }}" | sha256sum | cut -c1-10)" >> $GITHUB_OUTPUT
- uses: aptos-labs/aptos-core/.github/actions/check-aptos-core@main
with:
cancel-workflow: ${{ github.event_name == 'schedule' }} # Cancel the workflow if it is scheduled on a fork
# actions/get-latest-docker-image-tag requires docker utilities and having authenticated to internal docker image registries
- uses: aptos-labs/aptos-core/.github/actions/docker-setup@main
id: docker-setup
with:
GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
GCP_SERVICE_ACCOUNT_EMAIL: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }}
EXPORT_GCP_PROJECT_VARIABLES: "false"
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DOCKER_ARTIFACT_REPO: ${{ secrets.AWS_DOCKER_ARTIFACT_REPO }}
GIT_CREDENTIALS: ${{ secrets.GIT_CREDENTIALS }}
- uses: aptos-labs/aptos-core/.github/actions/get-latest-docker-image-tag@main
id: get-docker-image-tag
with:
branch: ${{ steps.determine-test-branch.outputs.BRANCH }}
variants: ${{ needs.generate-matrix.outputs.imageVariants }}
- uses: aptos-labs/aptos-core/.github/actions/determine-or-use-target-branch-and-get-last-released-image@main
id: get-last-released-image-tag-for-compat-test
with:
base-branch: ${{ steps.determine-test-branch.outputs.BRANCH }}
variants: ${{ needs.generate-matrix.outputs.imageVariants }}
- name: Write summary
run: |
IMAGE_TAG=${{ steps.get-docker-image-tag.outputs.IMAGE_TAG }}
IMAGE_TAG_FOR_COMPAT_TEST=${{ steps.get-last-released-image-tag-for-compat-test.outputs.IMAGE_TAG }}
TARGET_BRANCH_TO_FETCH_IMAGE_FOR_COMPAT_TEST=${{ steps.get-last-released-image-tag-for-compat-test.outputs.TARGET_BRANCH }}
BRANCH=${{ steps.determine-test-branch.outputs.BRANCH }}
if [ -n "${BRANCH}" ]; then
echo "BRANCH: [${BRANCH}](https://github.com/${{ github.repository }}/tree/${BRANCH})" >> $GITHUB_STEP_SUMMARY
fi
echo "IMAGE_TAG: [${IMAGE_TAG}](https://github.com/${{ github.repository }}/commit/${IMAGE_TAG})" >> $GITHUB_STEP_SUMMARY
echo "To cancel this job, do `pnpm infra ci cancel-workflow ${{ github.run_id }}` from internal-ops" >> $GITHUB_STEP_SUMMARY
run:
name: forge-${{ matrix.TEST_NAME }}
needs: [determine-test-metadata, generate-matrix]
if: ${{ github.event_name != 'pull_request' }}
strategy:
fail-fast: false
max-parallel: ${{ inputs.JOB_PARALLELISM && fromJson(inputs.JOB_PARALLELISM) || 1 }}
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
uses: aptos-labs/aptos-core/.github/workflows/workflow-run-forge.yaml@main
secrets: inherit
with:
IMAGE_TAG: ${{ ((matrix.FORGE_TEST_SUITE == 'compat' && needs.determine-test-metadata.outputs.IMAGE_TAG_FOR_COMPAT_TEST) || needs.determine-test-metadata.outputs.IMAGE_TAG) }}
FORGE_NAMESPACE: forge-${{ matrix.TEST_NAME }}-${{ needs.determine-test-metadata.outputs.BRANCH_HASH }}
FORGE_TEST_SUITE: ${{ matrix.FORGE_TEST_SUITE }}
FORGE_RUNNER_DURATION_SECS: ${{ matrix.FORGE_RUNNER_DURATION_SECS }}
FORGE_ENABLE_PERFORMANCE: ${{ matrix.FORGE_ENABLE_PERFORMANCE || false }}
FORGE_ENABLE_FAILPOINTS: ${{ matrix.FORGE_ENABLE_FAILPOINTS || false }}
POST_TO_SLACK: true
SEND_RESULTS_TO_TRUNK: true