runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967

rphmeier · 2021-06-19T02:10:40Z

When performing parachain code upgrades, we currently include the new parachain code.

When announcing the code upgrade, the full code appears in the candidate receipt.
When applying the code upgrade, the full code appears in the PoV, as it moves from one section of the state to the next.

The code we have empirically for parachains is quite large, typically in the 500K to 800K range (Sergei: I observed PVFs up to a couple of megabytes).

Avoiding code in the critical path is important because it reduces friction at runtime upgrade points, if backing groups have relatively low bandwidth. It's not unreasonable for upgrade blocks to take a few minutes to get backed in the status quo. It also makes the code size more independent from the PoV size, opening up the opportunity for parachain developers to build more complex runtimes without being affected by restrictions targeting critical-path bandwidth.

This issue will be split into two sections, one for each of these points.

Solving Code in Candidate Receipts: Hash-based announcements

At the moment, PVFs announce code upgrades by returning the full code when it's allowed, according to the state root of the relay chain. This code then appears, in full, in the candidate commitments. These candidate commitments are, in turn, gossiped among all validators so they can be included into the relay chain by the block author, who is most likely not a backer of the parachain doing the code upgrade.

An improvement to this situation would be for the PVF to only output the hash and size of the code, for inclusion in the candidate commitments.

Upon reaching the relay chain, the future code announcement creates a grace period where any user of the relay chain can upload the code using an UnsignedTransaction. These uploads are not on the critical path of parachain execution, and parachain code upgrades need to be delayed anyway for other reasons (See paritytech/polkadot#3211). Once the code is actually uploaded to the relay chain, the relay chain is ready for the parachain to upgrade its code and after the code_upgrade_delay, as specified in the HostConfiguration, the code can be upgraded at any time.

Solving Code in upgrade parablock PoVs: Move code to the PVF parameters and the `AvailableData`.

When a parachain actually triggers its code upgrade, in practice, it involves the PVF moving the new code from one section of the trie to another. Although this is not strictly necessary within the parachain execution model, Cumulus-based parachains store their code in the state trie.

There are two approaches I considered to solve this problem:

Write PVFs in such a way that the code doesn't need to be moved around in its state. What this means for Cumulus is making :code less special, or giving a way for :code to specify some other trie node which actually holds the real code.
Pass the code, optionally, into the PVF, so it doesn't need to be loaded from the PoV. When it is passed, keep it available in the AvailableData.

The problem with approach 1 is that although we avoid including the code in the PoV at the point of the upgrade, we still have include the code in the PoV in some other block, where the code was moved into the storage of the parachain. This makes it a non-solution, so we'll ignore it and look at approach 2.

The idea of approach 2 is to make 3 alterations to parachain primitives:

struct AvailableData {
    // other fields..
   code: Option<ValidationCode>, // new
}

struct CandidateDescriptor {
  // other fields..
  applies_upgrade: bool, // new
}

struct ValidationParams {
  // other fields..
  code: Option<ValidationCode>, // new
}

With these changes, we continue to make the code available in the erasure-coding of the AvailableData that is kept by the entire validator-set, but it no longer needs to be sent explicitly between the collator and the backers or between the backers. Instead, if applies_upgrade is true, the backers can draw the code from other sources. At the moment, scheduled validation code is stored on-chain, but even in the future, when validation code is stored off-chain, the backing validators will have it to pass into the PVF.

Since the backing pipeline is the critical path, reducing the bandwidth between these actors will have a huge beneficial effect on the performance of the blocks applying runtime upgrades.

It is illegal for the CandidateDescriptor to contain applies_upgrade == true if the context it is executed in does not have a scheduled code upgrade for the parachain. Honest backers won't place Some into AvailableData::code if applies_upgrade == true. The runtime of the relay chain will reject all such candidates, so it's known that every candidate receipt that appears on-chain, pre-availability, correctly indicates whether the AvailableData::code should contain Some.

As an approval checker or a dispute participant, if the applies_upgrade == false and the AvailableData::code is Some, the candidate is invalid, as well as vice-versa. This means that any malicious backers which have managed to include a false AvailableData are slashed, and also that the candidate won't be finalized. This check is safe, because anything that has included has already passed the runtime check in the past. If these checks pass, and AvailableData::code is Some, then it should be passed into the PVF during the approval check.

Lastly on the core protocol side, the only thing that the parachain storage needs to store is the hash of the upcoming code. When the PVF accepts the new code, it can check that the code passed in hashes to the correct value, and then write it to its state. Writes to state don't affect the PoV size significantly as a general rule, but especially when a trie node with given (:code in practice) is already present.

Implementing this new PVF for Cumulus nodes poses a small additional challenge because of its requirements:

The head-data from the PVF needs to match the header of the actual block in the associated Substrate chain.
The entire Cumulus chain should be executable from the genesis with nothing other than a state DB.

From these requirements, it's clear that the actual blocks that Cumulus nodes synchronize, store, and execute, need to contain the new code at some point in the chain. So the challenge is to find a way to do this in a way where the PoV never does.

The solution that I propose is to have a special inherent, something like this:

ApplyNewCode(ValidationCode),

This is what appears in the full block outside of the PVF. However, what appears in the PVF is a slightly modified version

ApplyNewCode, // Parameter is implicit.

In the initial stages of PVF execution, if this inherent is found, then the PVF must have accepted Some(validation_code) as its argument or the inputs are invalid. It can replace the stub inherent with the full version, and this achieves the 3 goals that the produced Cumulus blockchain contains the new code, the PVF produces head data that matches the Cumulus blockchain, and that the PoV doesn't contain any code ever.

The text was updated successfully, but these errors were encountered:

burdges · 2021-06-19T12:30:34Z

I've lost our recent issue discussing this now, but.. We could handle :code as a distinguished recent parachain block, not state data. We'd need a mechanism for fetching a recent block from availability, which sounds heavy, but parachains could bypass this cost because nodes already cached their build.

We could reuse this technique for MEV protections in parachains too:

We split the parachain block into "run now" and "run 10 slots in the future", perhaps by pushing a much of transactions into the state, but preferably by splitting the availability encoding. In other words, we ideally make block n actually process the block of transactions placed into availability by block n-10. We then permute the transaction order by the relay chain randomness, so transactions could now fail but block n's backing checker marks the bad ones. This provides MEV protection.

We'll include two ephemeral decryption keys associated to sassafras slot assignment proofs, for which the upcoming block producer knows the secret key, but when the block producer makes their sassafras block they delete the first secret key, as they've already decrypted any transactions, and then publish the second in the header. We turn this into even stronger MEV protection by decrypting in block n the transactions placed into availability in block n-10 using the keys published by the intervening blocks. In other words, we'd prevent MEV by running something vaguely like mixnet style decrpytion on-chain.

In both cases, we need a whole block to either hang out in state for 10 slots or else provide some means by which the block 10 slots later fetches it form availability.

rphmeier · 2021-06-19T18:33:08Z

@burdges

I'm generally not a fan of the "treat code upgrades as a special block" because it's unclear how Cumulus should handle that block. As mentioned in the issue, we have the goal that the produced Cumulus chain can be synchronized entirely on its own. I don't think we could do the 'special block' thing unless we altered Substrate itself to support those types of special blocks. That sounds really difficult so it's a class of solution I would prefer to avoid.

burdges · 2021-06-19T23:21:43Z

Yes, it'd ask substrate to treat special blocks like detached state data and alters pruning rules, so yes it touches several things and I'm unsure the complexity. It's roughly your 2 though, no?

~~Also, where was the other recent issue you opened on this? I'd started to come around to your perspective, but now lost my train of thought..~~

I should reread my own thoughts in paritytech/polkadot#3211 too. ;)

cheme · 2021-06-21T14:11:22Z

the PVF must have accepted Some(validation_code) as its argument or the inputs are invalid.

Not sure if there is a way to avoid putting 'validation_code' in memory when running the PVF?

Does not seems easy without specific validation of block data in polkadot. (or some mechanism involving specific host function that would build some specific hashing with the external validition_code, and thus a validation function a bit different than the runtime (or overload of a host function for it as currently done for diverging code)).

rphmeier · 2021-06-21T16:33:44Z

@cheme We don't care (that much) about memory usage. This is about PoV size. I am not sure you have understood the issue well enough.

The only change this needs on the trie side is to make sure that when overwriting but not reading :code, the old value of :code does not appear in the PoV. Everything else will be handled by parachain protocol changes described in this issue. But this trie optimization is out of scope for the issue and should be discussed elsewhere

rphmeier added the I10-optimisation label Jun 19, 2021

This was referenced Jun 20, 2021

Versioned Parachain Primitives & Upgrade Path paritytech/polkadot#3317

Open

guide: changes for pre-registered code upgrades paritytech/polkadot#3121

Closed

cheme mentioned this issue Jun 21, 2021

Expose validation specific host function. paritytech/polkadot#2366

Closed

pepyakin mentioned this issue Jun 25, 2021

Introduce upgrade go-ahead and upgrade restriction signals paritytech/polkadot#3371

Merged

pepyakin mentioned this issue Aug 24, 2023

Consider extracting validation code (aka PVF) handling from paras into a separate module #938

Open

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. and removed I10-optimisation labels Aug 25, 2023

eskimor changed the title ~~Methods to avoid ever including parachain code in critical-path data~~ runtime upgrade: Methods to avoid ever including parachain code in critical-path data Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967

runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967

rphmeier commented Jun 19, 2021 •

edited by pepyakin

Loading

burdges commented Jun 19, 2021

rphmeier commented Jun 19, 2021

burdges commented Jun 19, 2021 •

edited

Loading

cheme commented Jun 21, 2021

rphmeier commented Jun 21, 2021 •

edited

Loading

runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967

runtime upgrade: Methods to avoid ever including parachain code in critical-path data #967

Comments

rphmeier commented Jun 19, 2021 • edited by pepyakin Loading

Solving Code in Candidate Receipts: Hash-based announcements

Solving Code in upgrade parablock PoVs: Move code to the PVF parameters and the AvailableData.

burdges commented Jun 19, 2021

rphmeier commented Jun 19, 2021

burdges commented Jun 19, 2021 • edited Loading

cheme commented Jun 21, 2021

rphmeier commented Jun 21, 2021 • edited Loading

rphmeier commented Jun 19, 2021 •

edited by pepyakin

Loading

Solving Code in upgrade parablock PoVs: Move code to the PVF parameters and the `AvailableData`.

burdges commented Jun 19, 2021 •

edited

Loading

rphmeier commented Jun 21, 2021 •

edited

Loading