Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Tracking issue: Pre-checking for PVF Compilation time #3211

Closed
24 tasks done
rphmeier opened this issue Jun 11, 2021 · 4 comments · Fixed by #7559
Closed
24 tasks done

Tracking issue: Pre-checking for PVF Compilation time #3211

rphmeier opened this issue Jun 11, 2021 · 4 comments · Fixed by #7559
Assignees
Labels
I2-security The node fails to follow expected, security-sensitive, behaviour. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Jun 11, 2021

We are going to use this issue for tracking progress on the PVF compilation time pre-checking.

Original Description

Problem

If compilation of a Parachain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in limbo as to whether a candidate of that parachain is valid or not.

The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF.

This has the following implications:

  • In backing, inclusion may be slow due to backing groups being unable to execute the block
  • In approval checking, there may be many no-shows, leading to slow finality
  • In disputes, neither side may reach supermajority. Nobody will get slashed and the chain will not be reverted or finalized.

As a result of this issue we need a fairly hard guarantee that the PVFs of registered parachains/threads can be compiled within a reasonable amount of time.

Solution

To have all validators pre-check that the amount of time the validation code takes to compile is under some reasonable limit. Before a parathread can be registered by users, and before a PVF upgrade can be applied by any parathread or chain, the validation code must be signed off on by 2/3 of the current validator set as taking less than some fixed timeout, specified on-chain.

Validators should constantly observe the chain to learn about PVFs which are pending approval, and should acquire and compile these PVFs. They then submit an unsigned transaction in either the affirmative or negative about the PVF. PVFs which don't receive a supermajority of validators attesting to low time taken to compile, within a certain amount of blocks, are discarded.

When compiling PVFs outside of this check, a much more permissive timeout should be used, and as such the artifact will eventually be compiled and approval checks, dispute votes, etc. will be issued, leading to no stall of the relay chain or of finality.

As a downside, this does mean that every change to the PVF requires additional work to be done by every validator at the time of submission, which means that parathread registration and upgrades should be more expensive / more limited.

Alternative Solution: Approval Checking

Alternatively, we could have approval checking check that:
a) the Wasm blob produced by any candidate compiles within a reasonable amount of time
b) any new Wasm blob registered on-chain by a transaction compiles within a reasonable amount of time

However, there are issues with this solution.

  1. it is unclear whether validators that find that either a or b doesn't hold should issue a dispute or not. If they do issue a dispute, then it is possible that they or other validators will be wrongly slashed or the dispute will reach no conclusion.
  2. If the validators don't issue a dispute and the block never gets finalized, then the relay chain should be eventually reorganize to avoid the block in question. However, there is nothing in the runtime stopping the offending transaction or pending PVF upgrade from being re-submitted, leading to an infinite cycle of relay chain abandonment.

For these reasons, the initial pre-check should be used.

@rphmeier rphmeier added the I2-security The node fails to follow expected, security-sensitive, behaviour. label Jun 11, 2021
@burdges
Copy link
Contributor

burdges commented Jun 11, 2021

It's tricky.. I've changed my tune somewhat..

In my mind, approval checking had three advantages: First, we avoid all validators checking parathread code upgrades, but an alternative would simply be making parathread code upgrades expensive. Second we're already building a approvals and disputes procedure, so this seems simplest, but yes important differences exist. We'll also want new validators to join, and for some validators to churn between relay chains when we move towards multiple relay chains, which both require examining our assumptions about the build artifacts caching, so lazy builds sounded nice, but maybe this sucks.

Approval checking would require disputes and slashing since parachains could repeatedly register invalid code until one passed approval checks, which then enables attacks. Is this worth the risk if everyone needs to build the code eventually anyways and validators all have the storage to cache the artifact? It's trick but maybe no, so an approval checking design might start parachain code upgrade blocks already escalated.

As for parathreads, we'll maybe have more confidence in build times but the time we build parathreads, but maybe we just make them pay more for upgrades.

We've accepted a model where parachain code gets referenced by hash, so multiple code blobs exist simultaneously already. If parachain code upgrades are special blocks that do not otherwise interact with XCMP, etc., then we need not roll back invalid code upgrades and could merely invalidate the code upgrade. Yes, this sounds like a big difference from the usual inclusion pipeline.

At some level I think this boils down primarily to slashing: Are we happy with slashing and forensics for build failures? If no, then we need escalated build checking, which no longer looks much like approvals.

@burdges
Copy link
Contributor

burdges commented Jun 11, 2021

We might leverage the approval's logic for grandpa though: If a parachain posts a code block then we transport the code to all validators, perhaps via availability and reconstructing, and then all validators build the code. We then block finality upon build results similarly to blocking finality upon approval votes. We do not currently know about finality on the relay chain though, so I guess this buys us nothing really.

@pepyakin pepyakin changed the title Pre-checking for PVF Compilation time Tracking issue: Pre-checking for PVF Compilation time Aug 23, 2021
@pepyakin pepyakin self-assigned this Aug 23, 2021
pepyakin added a commit that referenced this issue Nov 29, 2021
Right now, most of operations that sign stuff in polkadot protocol are
handled by a very convenient tool - `Signed`. However `Signed` assumes
that whatever is signed is anchored to some `parent_hash` which works
for most cases, but does not work for others.

One instance of such a case is pre-checking (#3211). There validators
submit signed votes on-chain. A vote is valid for the entire session. If
we were to use `Signed` we would have to root a vote in some block of
that session and during vote verification check that this block is
indeed within the session. This is especially annoying since we agreed
to use unsigned extrinsics to submit votes and we need to make the
unsigned extrinsic validation as slim as possible.

(FWIW, the definition of a pre-checking vote can be seen in the next
diff in the stack)

That's the reason why we opted-out from using `Signed` for pre-checking
and decided to go with the manual signing approach. Almost every piece
of machinery is in place except for signing which is presented in this
PR.
pepyakin added a commit that referenced this issue Nov 29, 2021
This is an insubstantial PR that just unlocks PRs down the line. This PR
is a part of #3211.

Regarding the `PvfCheckStatement` struct itself: this is a structure
that will be used to convert from/into the binary representation and
ultimately will be used to sign and submit votes onto chain.
pepyakin added a commit that referenced this issue Nov 29, 2021
Right now, most of operations that sign stuff in polkadot protocol are
handled by a very convenient tool - `Signed`. However `Signed` assumes
that whatever is signed is anchored to some `parent_hash` which works
for most cases, but does not work for others.

One instance of such a case is pre-checking (#3211). There validators
submit signed votes on-chain. A vote is valid for the entire session. If
we were to use `Signed` we would have to root a vote in some block of
that session and during vote verification check that this block is
indeed within the session. This is especially annoying since we agreed
to use unsigned extrinsics to submit votes and we need to make the
unsigned extrinsic validation as slim as possible.

(FWIW, the definition of a pre-checking vote can be seen in the next
diff in the stack)

That's the reason why we opted-out from using `Signed` for pre-checking
and decided to go with the manual signing approach. Almost every piece
of machinery is in place except for signing which is presented in this
PR.
pepyakin added a commit that referenced this issue Nov 29, 2021
This is an insubstantial PR that just unlocks PRs down the line. This PR
is a part of #3211.

Regarding the `PvfCheckStatement` struct itself: this is a structure
that will be used to convert from/into the binary representation and
ultimately will be used to sign and submit votes onto chain.
pepyakin added a commit that referenced this issue Nov 29, 2021
This PR is a part of
#3211.

This PR prepares ground for the following runtime changes required for
PVF pre-checking. Specifically, we do several changes here:

1. We remove `validation_code_at` and `validation_code_hash_at`. Those
   functions are not used. They were added in the early days with intent
   to use it later but turned out that we do not need them.
2. We replace `validation_code_hash_at` with just `current_code_hash`
   for the case of inclusion and candidate checking.
3. We also replace `last_code_upgrade` with a direct query into
   `FutureCodeHash` and `UpgradeRestrictionSignal`. Those in conjunction
   should replace the logic that was used for allowing/disallowing
   upgrades. This requires special attention of the reviewers.
4. Then we remove the machinery required to support those queries.
   Specifically the code related to `UseCodeAt`. We do not need it since
   we do not answer the historical queries. However, we still leave all
   the data on-chain. At some point we may clean it up, but that would
   be needed to be done with a dedicated migration which can be done as
   follow-up.
5. Some now irrelevant tests were removed and/or adapted.
pepyakin added a commit that referenced this issue Nov 30, 2021
Right now, most of operations that sign stuff in polkadot protocol are
handled by a very convenient tool - `Signed`. However `Signed` assumes
that whatever is signed is anchored to some `parent_hash` which works
for most cases, but does not work for others.

One instance of such a case is pre-checking (#3211). There validators
submit signed votes on-chain. A vote is valid for the entire session. If
we were to use `Signed` we would have to root a vote in some block of
that session and during vote verification check that this block is
indeed within the session. This is especially annoying since we agreed
to use unsigned extrinsics to submit votes and we need to make the
unsigned extrinsic validation as slim as possible.

(FWIW, the definition of a pre-checking vote can be seen in the next
diff in the stack)

That's the reason why we opted-out from using `Signed` for pre-checking
and decided to go with the manual signing approach. Almost every piece
of machinery is in place except for signing which is presented in this
PR.
pepyakin added a commit that referenced this issue Nov 30, 2021
* pvf-precheck: Add `sign` in subsystem-util

Right now, most of operations that sign stuff in polkadot protocol are
handled by a very convenient tool - `Signed`. However `Signed` assumes
that whatever is signed is anchored to some `parent_hash` which works
for most cases, but does not work for others.

One instance of such a case is pre-checking (#3211). There validators
submit signed votes on-chain. A vote is valid for the entire session. If
we were to use `Signed` we would have to root a vote in some block of
that session and during vote verification check that this block is
indeed within the session. This is especially annoying since we agreed
to use unsigned extrinsics to submit votes and we need to make the
unsigned extrinsic validation as slim as possible.

(FWIW, the definition of a pre-checking vote can be seen in the next
diff in the stack)

That's the reason why we opted-out from using `Signed` for pre-checking
and decided to go with the manual signing approach. Almost every piece
of machinery is in place except for signing which is presented in this
PR.

* pvf-precheck: Add `PvfCheckStatement` to polkadot-primitives

This is an insubstantial PR that just unlocks PRs down the line. This PR
is a part of #3211.

Regarding the `PvfCheckStatement` struct itself: this is a structure
that will be used to convert from/into the binary representation and
ultimately will be used to sign and submit votes onto chain.
drahnr pushed a commit that referenced this issue Dec 16, 2021
* pvf-precheck: Integrate PVF pre-checking into paras module

Closes #4009

This is the most of the runtime-side change needed for #3211.

Here is how it works.

The PVF pre-checking can be triggered either by an upgrade or by
onboarding (i.e. calling `schedule_para_initialize`). The PVF
pre-checking process is identified by the PVF code hash that is being
voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking process will be started. Instead, we just subscribe
to the existing one.

If there is no PVF pre-checking process running but the PVF code hash
was already saved in the storage, that necessarily means (I invite the
reviewers to double-check this invariant) that the PVF already passed
pre-checking. This is equivalent to instant approving of the PVF.

The pre-checking process can be concluded either by obtaining a
supermajority or if it expires.

Each validator checks the list of PVFs available for voting. The vote is
binary, i.e. accept or reject a given PVF. As soon as the supermajority
of votes are collected for one of the sides of the vote, the voting is
concluded in that direction and the effects of the voting are enacted.

Only validators from the active set can participate in the vote. The set
of active validators can change each session. That's why we reset the
votes each session. A voting that observed a certain number of sessions
will be rejected.

The effects of the PVF accepting depend on the operations requested it:

1. All onboardings subscribed to the approved PVF pre-checking process will
get scheduled and after passing 2 session boundaries they will be onboarded.
2. All upgrades subscribed to the approved PVF pre-checking process will
get scheduled very similarly to the existing process. Upgrades with
pre-checking are really the same process that is just delayed by the
time required for pre-checking voting. In case of instant approval the
mechanism is exactly the same. This is important from parachains
compatibility standpoint since following the delayed upgrade requires
the parachain to implement
paritytech/cumulus#517.

In case, PVF pre-checking process was concluded with rejection, then all
the requesting operations get cancelled. For onboarding it means it gets
without movement: the lifecycle of such parachain is terminated on the
`Onboarding` state and after rejection the lifecycle is none. That in
turn means that the caller can attempt registering the parachain once
more. For upgrading it means that the upgrade process is aborted: that
flashes go-ahead signal with `Abort` flag.

Rejection leads to removing the allegedly bad validation code from the
chain storage. Among other things, this implies that the operation can
be re-requested. That allows for retrying an operation in case there was
some bug. At the same time it does not look as a DoS vector due to the
caching performed by the nodes.

PVF pre-checking can be enabled and disabled. Initially, according to
the changes in #4420, this mechanism is disabled. Triggering the PVF
pre-checking when it is disabled just means that we insta approve the
requesting operation. This should lead to the behavior being unchanged.

Follow-ups:

- expose runtime APIs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=polkadot-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/polkadot/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=westend-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/westend/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=kusama-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/kusama/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features runtime-benchmarks -- benchmark --chain=rococo-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/rococo/src/weights/runtime_parachains_paras.rs

* Review fixes

Co-authored-by: Parity Bot <[email protected]>
pepyakin added a commit that referenced this issue Dec 30, 2021
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
pepyakin added a commit that referenced this issue Dec 30, 2021
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
pepyakin added a commit that referenced this issue Dec 30, 2021
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
pepyakin added a commit that referenced this issue Jan 3, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
pepyakin added a commit that referenced this issue Jan 3, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
drahnr pushed a commit that referenced this issue Jan 4, 2022
* pvf-precheck: Integrate PVF pre-checking into paras module

Closes #4009

This is the most of the runtime-side change needed for #3211.

Here is how it works.

The PVF pre-checking can be triggered either by an upgrade or by
onboarding (i.e. calling `schedule_para_initialize`). The PVF
pre-checking process is identified by the PVF code hash that is being
voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking process will be started. Instead, we just subscribe
to the existing one.

If there is no PVF pre-checking process running but the PVF code hash
was already saved in the storage, that necessarily means (I invite the
reviewers to double-check this invariant) that the PVF already passed
pre-checking. This is equivalent to instant approving of the PVF.

The pre-checking process can be concluded either by obtaining a
supermajority or if it expires.

Each validator checks the list of PVFs available for voting. The vote is
binary, i.e. accept or reject a given PVF. As soon as the supermajority
of votes are collected for one of the sides of the vote, the voting is
concluded in that direction and the effects of the voting are enacted.

Only validators from the active set can participate in the vote. The set
of active validators can change each session. That's why we reset the
votes each session. A voting that observed a certain number of sessions
will be rejected.

The effects of the PVF accepting depend on the operations requested it:

1. All onboardings subscribed to the approved PVF pre-checking process will
get scheduled and after passing 2 session boundaries they will be onboarded.
2. All upgrades subscribed to the approved PVF pre-checking process will
get scheduled very similarly to the existing process. Upgrades with
pre-checking are really the same process that is just delayed by the
time required for pre-checking voting. In case of instant approval the
mechanism is exactly the same. This is important from parachains
compatibility standpoint since following the delayed upgrade requires
the parachain to implement
paritytech/cumulus#517.

In case, PVF pre-checking process was concluded with rejection, then all
the requesting operations get cancelled. For onboarding it means it gets
without movement: the lifecycle of such parachain is terminated on the
`Onboarding` state and after rejection the lifecycle is none. That in
turn means that the caller can attempt registering the parachain once
more. For upgrading it means that the upgrade process is aborted: that
flashes go-ahead signal with `Abort` flag.

Rejection leads to removing the allegedly bad validation code from the
chain storage. Among other things, this implies that the operation can
be re-requested. That allows for retrying an operation in case there was
some bug. At the same time it does not look as a DoS vector due to the
caching performed by the nodes.

PVF pre-checking can be enabled and disabled. Initially, according to
the changes in #4420, this mechanism is disabled. Triggering the PVF
pre-checking when it is disabled just means that we insta approve the
requesting operation. This should lead to the behavior being unchanged.

Follow-ups:

- expose runtime APIs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=polkadot-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/polkadot/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=westend-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/westend/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features=runtime-benchmarks -- benchmark --chain=kusama-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/kusama/src/weights/runtime_parachains_paras.rs

* cargo run --quiet --release --features runtime-benchmarks -- benchmark --chain=rococo-dev --steps=50 --repeat=20 --pallet=runtime_parachains::paras --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --header=./file_header.txt --output=./runtime/rococo/src/weights/runtime_parachains_paras.rs

* Review fixes

Co-authored-by: Parity Bot <[email protected]>
pepyakin added a commit that referenced this issue Jan 4, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.

TODO:

- [ ] Run once more with polkadot-launch
pepyakin added a commit that referenced this issue Jan 5, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.
pepyakin added a commit that referenced this issue Jan 7, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.
pepyakin added a commit that referenced this issue Jan 7, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.
paritytech-processbot bot pushed a commit that referenced this issue Jan 7, 2022
This commit implements the last major piece of #3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.
Wizdave97 pushed a commit to ComposableFi/polkadot that referenced this issue Feb 3, 2022
This commit implements the last major piece of paritytech#3211: the subsystem that
tracks PVFs that require voting, issues pre-check requests to
candidate-validation and makes sure that the votes are submitted to the
chain.
@eskimor eskimor added the T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes. label Aug 24, 2022
@eskimor eskimor assigned slumber and unassigned pepyakin Aug 24, 2022
ggwpez pushed a commit to ggwpez/runtimes that referenced this issue Mar 10, 2023
This PR is a part of
paritytech/polkadot#3211.

This PR prepares ground for the following runtime changes required for
PVF pre-checking. Specifically, we do several changes here:

1. We remove `validation_code_at` and `validation_code_hash_at`. Those
   functions are not used. They were added in the early days with intent
   to use it later but turned out that we do not need them.
2. We replace `validation_code_hash_at` with just `current_code_hash`
   for the case of inclusion and candidate checking.
3. We also replace `last_code_upgrade` with a direct query into
   `FutureCodeHash` and `UpgradeRestrictionSignal`. Those in conjunction
   should replace the logic that was used for allowing/disallowing
   upgrades. This requires special attention of the reviewers.
4. Then we remove the machinery required to support those queries.
   Specifically the code related to `UseCodeAt`. We do not need it since
   we do not answer the historical queries. However, we still leave all
   the data on-chain. At some point we may clean it up, but that would
   be needed to be done with a dedicated migration which can be done as
   follow-up.
5. Some now irrelevant tests were removed and/or adapted.
@eskimor
Copy link
Member

eskimor commented Apr 17, 2023

@slumber There are a few check boxes left, can you make sure they are taken care of please. Thank you!

@slumber
Copy link
Contributor

slumber commented Apr 17, 2023

@slumber There are a few check boxes left, can you make sure they are taken care of please. Thank you!

This was my intention right after async backing work on the cumulus side, no worries.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I2-security The node fails to follow expected, security-sensitive, behaviour. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants