Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe closes #64 #6029

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from
Draft

Maybe closes #64 #6029

wants to merge 21 commits into from

Conversation

ordian
Copy link
Member

@ordian ordian commented Oct 11, 2024

Closes #64

This PR implements approach 4 outlined in the issue.

Unresolved questions:

  • Does the CallContext::{Offchain, Onchain} determine whether it's runtime calls / (block production or import)?
  • dryRun RPC call will not work properly on the block setting the pending code, right? Is that a problem? A: No, transactions are getting invalidated any way by runtime upgrades because the spec version changes
  • Do we want to pass IgnorePendingCode regardless in the higher level APIs?
  • Do we want to emit an event when pending code is set?
  • Should the system_version check also be done on the node side (block production and import)?
  • What is the best place to call maybe_apply_pending_code_upgrade?

TODO:

ordian added 4 commits October 7, 2024 18:22
* master: (28 commits)
  `substrate-node`: removed excessive polkadot-sdk features (#5925)
  Rename QueueEvent::StartWork (#6015)
  [ci] Remove quick-benchmarks-omni from GitLab (#6014)
  Set larger timeout for cmd.yml (#6006)
  Fix `0003-beefy-and-mmr` test (#6003)
  Remove redundant XCMs from dry run's forwarded xcms (#5913)
  Add RadiumBlock bootnodes to Coretime Polkadot Chain spec (#5967)
  Bump strum from 0.26.2 to 0.26.3 (#5943)
  Add PVF execution priority (#4837)
  Snowbridge V2 docs (#5902)
  Fix u256 conversion in BABE (#5994)
  [ci] Move test-linux-stable-no-try-runtime to GHA (#5979)
  Bump PoV request timeout (#5924)
  [Release/CI] Github flow to build `polkadot`/`polkadot-parachain` rc binaries and deb package (#5963)
  [ci] Remove short-benchmarks from Gitlab (#5988)
  Disable flaky tests reported in 5972/5973/5974 (#5976)
  Bump some dependencies (#5886)
  bump zombienet version and set request for k8s (#5968)
  [omni-bencher] Make all runtimes work (#5872)
  Omni-Node renamings (#5915)
  ...
@bkchr
Copy link
Member

bkchr commented Oct 11, 2024

  • dryRun RPC call will not work properly on the block setting the pending code, right? Is that a problem?

Why?

make sure this #64 (comment) is not a problem

This also happens in the block before. Not sure how this should be a problem in the runtime upgrade block. Yes, it decreases the available PoV size, but then we need to use multi block migrations.

@ordian
Copy link
Member Author

ordian commented Oct 11, 2024

dryRun RPC call will not work properly on the block setting the pending code, right? Is that a problem?

Why?

If we try to check a transaction at the block that sets the new code as pending code, it will use the old code for checking it (because RPC uses a runtime call with offchain context), but the next block will use the new code. So even though transaction might be valid with old code, it might be invalid with the new one (or vice versa). Am I missing something here?

@bkchr
Copy link
Member

bkchr commented Oct 12, 2024

Ahh you meant validate_transaction. Transactions are getting invalidated any way by runtime upgrades because the spec version changes. Runtime upgrades are also not happen that regularly, so I think it is fine.

@ordian
Copy link
Member Author

ordian commented Oct 12, 2024

This also happens in the block before. Not sure how this should be a problem in the runtime upgrade block. Yes, it decreases the available PoV size, but then we need to use multi block migrations.

For this item, what I want to test is a runtime upgrade with code close to max size to make sure it fits into PoV. My worry is that it will be included twice (as part of block body and state proof in :pending_code or twice in state proof). Given that max code size is set as 3mb, two of them might not fit into PoV limits (unless we bump them), although compression might save us here.

@ordian
Copy link
Member Author

ordian commented Oct 15, 2024

Another question I have is that it seems to me that we only start compiling the new code with wasmtime when we call

pub fn with_instance<'c, H, R, F>(
In that case, it means that currently, we will compile the new runtime right after importing the block N:

┌────────────────────────────────────────────────────────────────┐
│┌────────────────────┐┌──────────────────┐┌────────────────────┐│
││                    ││                  ││┌────┐              ││
││      block #N      ││   compile new    │││migr│ block #N+1   ││
││                    ││     runtime      │││atio│              ││
││                    ││                  ││└────┘              ││
│└────────────────────┘└──────────────────┘└────────────────────┘│
└────────────────────────────────────────────────────────────────┘

because we will call some runtime API after importing it. However, with this PR, we will only start compiling the new runtime when we start importing block N+1 (unless we are a block producer) as the runtime API calls for block N use old code:

┌─────────────────────────────────────────────────────────────────────────┐
│┌────────────────────┐┌──────────────────┐┌─────────────────────────────┐│
││                    ││                  ││┌──────┐┌────┐               ││
││      block #N      ││      :(          │││compi ││migr│  block #N+1   ││
││                    ││                  │││lation││atio│               ││
││                    ││                  ││└──────┘└────┘               ││
│└────────────────────┘└──────────────────┘└─────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────┘

Is that correct, or I am missing somethings?

@bkchr
Copy link
Member

bkchr commented Oct 15, 2024

My worry is that it will be included twice (as part of block body and state proof in :pending_code or twice in state proof).

How? Parachains are not upgrading instantly. They first need to announce it to the relay chain and even if they would do it instantly, :pending_code is moved to :code in X + 1 while X contained the transaction that upgraded the node (but as I said before, this is not possible for parachains any way).

@bkchr
Copy link
Member

bkchr commented Oct 15, 2024

because we will call some runtime API after importing it. However, with this PR, we will only start compiling the new runtime when we start importing block N+1 (unless we are a block producer) as the runtime API calls for block N use old code:

While that is being correct, this doesn't mean it would always need to be the case that this is happening. However, we could add this optimization to ensure that the code is compiled in between importing and before building a new block. But this is not really a requirement for this PR IMO.

ordian and others added 12 commits October 16, 2024 15:07
* master: (129 commits)
  pallet-revive: Use `RUSTUP_TOOLCHAIN` if set (#6365)
  [eth-rpc] proxy /health (#6360)
  [Release|CI/CD] adjust release pipelines (#6366)
  Bump the known_good_semver group across 1 directory with 3 updates (#6339)
  Run check semver in MQ (#6287)
  [Deprecation] deprecate treasury `spend_local` call and related items (#6169)
  refactor and harden check_core_index (#6217)
  litep2p: Update litep2p to v0.8.0 (#6353)
  [pallet-staking] Additional check for virtual stakers (#5985)
  migrate pallet-remarks to v2 bench syntax (#6291)
  Remove leftover references of Wococo (#6361)
  snowbridge: allow account conversion for Ethereum accounts (#6221)
  authority-discovery: Populate DHT records with public listen addresses (#6298)
  Bounty Pallet: add `approve_bounty_with_curator` call to `bounties` pallet (#5961)
  Silent annoying log (#6351)
  [pallet-revive] rework balance transfers (#6187)
  `statement-distribution`: RFC103 implementation (#5883)
  Disable flaky tests reported in #6343 / #6345 (#6346)
  migrate pallet-recovery to benchmark V2 syntax (#6299)
  inclusion emulator: correctly handle UMP signals (#6178)
  ...
* master: (256 commits)
  fix chunk fetching network compatibility zombienet test (#6988)
  chore: delete repeat words (#7034)
  Print taplo version in CI (#7041)
  Implement cumulus StorageWeightReclaim as wrapping transaction extension + frame system ReclaimWeight (#6140)
  Make `TransactionExtension` tuple of tuple transparent for implication (#7028)
  Replace duplicated whitelist with whitelisted_storage_keys (#7024)
  [WIP] Fix networking-benchmarks (#7036)
  [docs] Fix release naming (#7032)
  migrate pallet-mixnet to umbrella crate (#6986)
  Improve remote externalities logging (#7021)
  Fix polkadot sdk doc. (#7022)
  Remove warning log from frame-omni-bencher CLI (#7020)
  [pallet-revive] fix file case (#6981)
  Add workflow for networking benchmarks (#7029)
  [CI] Skip SemVer on R0-silent and update docs (#6285)
  correct path in cumulus README (#7001)
  sync: Send already connected peers to new subscribers (#7011)
  Excluding chainlink domain for link checker CI (#6524)
  pallet-bounties: Fix benchmarks for 0 ED (#7013)
  Log peerset set ID -> protocol name mapping (#7005)
  ...
* master:
  workflows: add debug input for sync templates act (#7057)
  Remove usage of `sp-std` from Substrate (#7043)
  Fix typos (#7027)
  [core-fellowship] Add permissionless import_member (#7030)
  Avoid incomplete block import pipeline with full verifying import queue (#7050)
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12650411555
Failed job name: fmt

@ordian
Copy link
Member Author

ordian commented Jan 14, 2025

@bkchr any feedback on the questions and the implementation?

@skunert skunert self-requested a review January 27, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After runtime upgrade, Runtime API calls use new code with unmigrated storage
2 participants