EPMB Benchmarking: Make it work in CI #7714

kianenigma · 2025-02-25T17:14:01Z

At the moment we cannot run the benchmarks for all of the multi-block election pallets in the CI or bot.

These pallets are:

pallet_election_provider_multi_block
pallet_election_provider_multi_block::signed
pallet_election_provider_multi_block::unsigned
pallet_election_provider_multi_block::verifier

At the moment, these pallets only exist in the substrate-node runtime.

The reasons we cannot benchmark them in CI are:

The benchmarks need more memory than usual execution. We need to add --heap-pages 65000
These pallets are meant to be benchmark with --default-pov-mode measured
These pallets require these environment variables to be set: VALIDATORS=1000 NOMINATORS=25000. These will in turn create a large enough genesis config in the substrate node to do a full size election, and properly benchmark it.
The said environment variables atm only work if feature = "staking-playground" is also set in the staging-node-cli. Ref. We can remove this if statement if need be and always use these environment variables.

For the time being, I have circumvented this via:

polkadot-sdk/.github/workflows/tests.yml

Line 38 in 86e069b

    
           run: forklift cargo run --locked --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --quiet -- benchmark pallet --chain dev --pallet "*" --exclude-pallets=pallet_election_provider_multi_block,pallet_election_provider_multi_block::signed,pallet_election_provider_multi_block::unsigned,pallet_election_provider_multi_block::verifier --extrinsic "*" --steps 2 --repeat 1 --quiet

And for actual weight generation, I have to use a manual connection to the reference hardware machine to get the updated weights, which is rather slow.

Moreover, it is a sloppy hack from my side and should be fixed.

Fixing the command bot

The easier path is to enable the command bot to be able to support this. It would need to:

support setting custom env variable
support setting additional benchmarking args

Fixing the CI

I am less sure if this is feasible. The environment variables that I am talking about here will slow down the weight generation for all pallets. We should only attempt this IFF either:

the additional time is acceptable
we can configure the runtime matrix to set the environment variables only for the benchmarking of this pallet? maybe worth exploring.

For reference, here is how I currently benchmark the said pallets

STEPS=2
REPEAT=3
VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000
LOG="runtime::multiblock-election=info,runtime::staking=info"

# All measured benchmarks
RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::unsigned --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::signed --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::verifier --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

Note that we can use --default-pov-mode measured to the code.

The text was updated successfully, but these errors were encountered:

mordamax · 2025-02-26T11:53:55Z

can these values

VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000

be a default ones (regardless of env variables) if we build with --feature=runtime-benchmarks?

re-gius · 2025-02-26T11:55:48Z

Fixing the command bot
I think this is an added value regardless of the specific benchmarking issue.

Fixing the CI
I would prevent ad-hoc changes from affecting all the other benchmarks, even if that's only for additional time required, mainly for ease of maintenance of the CI. So I would go with finding a way to set these parameters just for the multi-block election pallets.

kianenigma · 2025-02-26T16:40:54Z

can these values
VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000
be a default ones (regardless of env variables) if we build with --feature=runtime-benchmarks?

They slow down the chain spec generation for all pallets if they are.

I am in agreement with @re-gius that if anything, let's just allow additional args/envs to be set in cmd bot.

re-gius · 2025-02-26T17:49:27Z

I can work on the command bot and keep you updated here

re-gius · 2025-02-27T17:32:32Z

Working on it in #7747

kianenigma · 2025-03-04T14:50:13Z

@mordamax let's discuss the approach here: some comments I received:

security: Indeed if someone sets NOMINATORS = LARGE VALUE the job would take longer. So it is a valid concern. But I also don't see the boundary of this, where do we draw the line of something being a security concern? The command bot is already available to all, and its execution can take long if someone spams it. We need to say setting these variables are allowed, but up to a limit. And perhaps core developers of Parity can configure a higher value. Weights are in general sensitive, and time consuming to generate.
Reproducibility: This is a misunderstanding. If I run with two different environment variables, it would indeed give two different results, and that's fine and expected

mordamax · 2025-03-04T16:00:16Z

TLDR - changing cmd.yml is tricky, and we should do this only if it really necessary for majority use-cases, for more narrow cases it's better to perform changes in cmd.yml - which you can test right away from your original PR + cover with unit-tests to make sure you're propagating --flags correctly and not affecting other logic (see test_cmd.py)

How to test cmd.yml changes:
We use https://github.com/paritytech-stg/polkadot-sdk/ to test cmd.yml as the only way to test it having it in master. That's the easiest path (although in general indeed it's hard to develop CI with on: issue_comment type of trigger
Steps:

do your changes in paritytech/polkadot-sdk re-gius/test-cmd-bench-flags branch
sync paritytech-stg from upstream
- small note: if you see that polkadot-sdk is not synced, and there are some commits - that might signal that someone perhaps still working on it, just check whose commits are ahead, and ask if you can override them
clone paritytech-stg/polkadot-sdk in separate dir
a) from there run: git fetch upstream re-gius/test-cmd-bench-flags && git rebase upstream/re-gius/test-cmd-bench-flags && git push origin HEAD -f (check if you have upstream in git remote -v) if you have an access to force-push (ask CI/Devops team to grant access)
b) just do normal PR, merge it to master of paritytech-stg/polkadot-sdk and you have it on master
once you force-pushed to master, you can create any PR in paritytech-stg/polkadot-sdk and it should already use the cmd.yml from your branch

also small note from me: LOG='runtime::multiblock-election=info,runtime::staking=info' don't add quotes if possible, as they are restricted by regex, as well as ,
re. adding ' and , to regex - not sure if that's good idea, i don't remember the use case we tried to eliminate, but afair that quotes might be a source of potential attack through doing some kind of injection, but may be that's not actual anymore
the best would be to clarify with @alvicsam & @pavelsupr to check that case
so if something doesn't work as you expected in cmd.yml - highly likely it's by design :)

Re logs, if they are typical for this runtime, could be added to https://github.com/paritytech/polkadot-sdk/blob/master/.github/workflows/runtimes-matrix.json
you can also adjust logs per pallet/runtime in cmd.py

adding it as arguments can more complicate the usage of cmd bot, as many people use it for many use-cases
it should remain to be simple. So if it's edge case like this - consider adding logic to cmd.py instead or configure it from global settings runtimes-matrix.json as no other runtimes/pallets will require these

that was ~95% of what i told in DM, sharing here :)

re-gius mentioned this issue Feb 27, 2025

Add custom environment variables and extra flags for benchmarking command #7747

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPMB Benchmarking: Make it work in CI #7714

EPMB Benchmarking: Make it work in CI #7714

kianenigma commented Feb 25, 2025

mordamax commented Feb 26, 2025

re-gius commented Feb 26, 2025

kianenigma commented Feb 26, 2025

re-gius commented Feb 26, 2025

re-gius commented Feb 27, 2025

kianenigma commented Mar 4, 2025

mordamax commented Mar 4, 2025 •

edited

Loading

EPMB Benchmarking: Make it work in CI #7714

EPMB Benchmarking: Make it work in CI #7714

Comments

kianenigma commented Feb 25, 2025

Fixing the command bot

Fixing the CI

mordamax commented Feb 26, 2025

re-gius commented Feb 26, 2025

kianenigma commented Feb 26, 2025

re-gius commented Feb 26, 2025

re-gius commented Feb 27, 2025

kianenigma commented Mar 4, 2025

mordamax commented Mar 4, 2025 • edited Loading

mordamax commented Mar 4, 2025 •

edited

Loading