Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPMB Benchmarking: Make it work in CI #7714

Open
kianenigma opened this issue Feb 25, 2025 · 7 comments
Open

EPMB Benchmarking: Make it work in CI #7714

kianenigma opened this issue Feb 25, 2025 · 7 comments

Comments

@kianenigma
Copy link
Contributor

At the moment we cannot run the benchmarks for all of the multi-block election pallets in the CI or bot.

These pallets are:

pallet_election_provider_multi_block
pallet_election_provider_multi_block::signed
pallet_election_provider_multi_block::unsigned
pallet_election_provider_multi_block::verifier

At the moment, these pallets only exist in the substrate-node runtime.

The reasons we cannot benchmark them in CI are:

  • The benchmarks need more memory than usual execution. We need to add --heap-pages 65000
  • These pallets are meant to be benchmark with --default-pov-mode measured
  • These pallets require these environment variables to be set: VALIDATORS=1000 NOMINATORS=25000. These will in turn create a large enough genesis config in the substrate node to do a full size election, and properly benchmark it.
  • The said environment variables atm only work if feature = "staking-playground" is also set in the staging-node-cli. Ref. We can remove this if statement if need be and always use these environment variables.

For the time being, I have circumvented this via:

run: forklift cargo run --locked --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --quiet -- benchmark pallet --chain dev --pallet "*" --exclude-pallets=pallet_election_provider_multi_block,pallet_election_provider_multi_block::signed,pallet_election_provider_multi_block::unsigned,pallet_election_provider_multi_block::verifier --extrinsic "*" --steps 2 --repeat 1 --quiet

And for actual weight generation, I have to use a manual connection to the reference hardware machine to get the updated weights, which is rather slow.

Moreover, it is a sloppy hack from my side and should be fixed.

Fixing the command bot

The easier path is to enable the command bot to be able to support this. It would need to:

  1. support setting custom env variable
  2. support setting additional benchmarking args

Fixing the CI

I am less sure if this is feasible. The environment variables that I am talking about here will slow down the weight generation for all pallets. We should only attempt this IFF either:

  1. the additional time is acceptable
  2. we can configure the runtime matrix to set the environment variables only for the benchmarking of this pallet? maybe worth exploring.

For reference, here is how I currently benchmark the said pallets

STEPS=2
REPEAT=3
VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000
LOG="runtime::multiblock-election=info,runtime::staking=info"

# All measured benchmarks
RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::unsigned --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::signed --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

RUST_LOG=${LOG} VALIDATOR_COUNT=${VALIDATOR_COUNT} VALIDATORS=${VALIDATORS} NOMINATORS=${NOMINATORS} RUST_BACKTRACE=1 WASM_BACKTRACE=1 cargo run --release -p staging-node-cli --bin substrate-node --features runtime-benchmarks --features staking-playground -- benchmark pallet --chain dev --pallet pallet_election_provider_multi_block::verifier --extrinsic all --steps ${STEPS} --repeat ${REPEAT} --template substrate/.maintain/frame-weight-template.hbs --heap-pages 65000 --default-pov-mode measured --output ../measured

Note that we can use --default-pov-mode measured to the code.

@mordamax
Copy link
Contributor

can these values

VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000

be a default ones (regardless of env variables) if we build with --feature=runtime-benchmarks?

@re-gius
Copy link
Contributor

re-gius commented Feb 26, 2025

Fixing the command bot
I think this is an added value regardless of the specific benchmarking issue.

Fixing the CI
I would prevent ad-hoc changes from affecting all the other benchmarks, even if that's only for additional time required, mainly for ease of maintenance of the CI. So I would go with finding a way to set these parameters just for the multi-block election pallets.

@kianenigma
Copy link
Contributor Author

can these values

VALIDATOR_COUNT=300
NOMINATORS=25000
VALIDATORS=1000

be a default ones (regardless of env variables) if we build with --feature=runtime-benchmarks?

They slow down the chain spec generation for all pallets if they are.

I am in agreement with @re-gius that if anything, let's just allow additional args/envs to be set in cmd bot.

@re-gius
Copy link
Contributor

re-gius commented Feb 26, 2025

I can work on the command bot and keep you updated here

@re-gius
Copy link
Contributor

re-gius commented Feb 27, 2025

Working on it in #7747

@kianenigma
Copy link
Contributor Author

@mordamax let's discuss the approach here: some comments I received:

  1. security: Indeed if someone sets NOMINATORS = LARGE VALUE the job would take longer. So it is a valid concern. But I also don't see the boundary of this, where do we draw the line of something being a security concern? The command bot is already available to all, and its execution can take long if someone spams it. We need to say setting these variables are allowed, but up to a limit. And perhaps core developers of Parity can configure a higher value. Weights are in general sensitive, and time consuming to generate.
  2. Reproducibility: This is a misunderstanding. If I run with two different environment variables, it would indeed give two different results, and that's fine and expected

@mordamax
Copy link
Contributor

mordamax commented Mar 4, 2025

TLDR - changing cmd.yml is tricky, and we should do this only if it really necessary for majority use-cases, for more narrow cases it's better to perform changes in cmd.yml - which you can test right away from your original PR + cover with unit-tests to make sure you're propagating --flags correctly and not affecting other logic (see test_cmd.py)

How to test cmd.yml changes:
We use https://github.com/paritytech-stg/polkadot-sdk/ to test cmd.yml as the only way to test it having it in master. That's the easiest path (although in general indeed it's hard to develop CI with on: issue_comment type of trigger
Steps:

  1. do your changes in paritytech/polkadot-sdk re-gius/test-cmd-bench-flags branch
  2. sync paritytech-stg from upstream
    • small note: if you see that polkadot-sdk is not synced, and there are some commits - that might signal that someone perhaps still working on it, just check whose commits are ahead, and ask if you can override them
  3. clone paritytech-stg/polkadot-sdk in separate dir
    a) from there run: git fetch upstream re-gius/test-cmd-bench-flags && git rebase upstream/re-gius/test-cmd-bench-flags && git push origin HEAD -f (check if you have upstream in git remote -v) if you have an access to force-push (ask CI/Devops team to grant access)
    b) just do normal PR, merge it to master of paritytech-stg/polkadot-sdk and you have it on master
    once you force-pushed to master, you can create any PR in paritytech-stg/polkadot-sdk and it should already use the cmd.yml from your branch

also small note from me: LOG='runtime::multiblock-election=info,runtime::staking=info' don't add quotes if possible, as they are restricted by regex, as well as ,
re. adding ' and , to regex - not sure if that's good idea, i don't remember the use case we tried to eliminate, but afair that quotes might be a source of potential attack through doing some kind of injection, but may be that's not actual anymore
the best would be to clarify with @alvicsam & @pavelsupr to check that case
so if something doesn't work as you expected in cmd.yml - highly likely it's by design :)

Re logs, if they are typical for this runtime, could be added to https://github.com/paritytech/polkadot-sdk/blob/master/.github/workflows/runtimes-matrix.json
you can also adjust logs per pallet/runtime in cmd.py

adding it as arguments can more complicate the usage of cmd bot, as many people use it for many use-cases
it should remain to be simple. So if it's edge case like this - consider adding logic to cmd.py instead or configure it from global settings runtimes-matrix.json as no other runtimes/pallets will require these

that was ~95% of what i told in DM, sharing here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants