Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Block building within the same wasm memory? #10557

Open
pepyakin opened this issue Dec 24, 2021 · 8 comments
Open

Block building within the same wasm memory? #10557

pepyakin opened this issue Dec 24, 2021 · 8 comments
Milestone

Comments

@pepyakin
Copy link
Contributor

This is not necessarily a feature request or call to action, but just writing down the thoughts on this topic.

Whenever substrate imports a block, it would call a runtime API function exposed as execute_block. Under the hood, it would initialize the state of the block, pick and run each and every extrinsic and then finalize the block, and all within the same wasm instance. This means that all memory is persistent within the two invocations. Simply speaking, changes to memory in on_initialize will be visible in on_finalize.

However, In contrast to this, while building a block, each state will be its own runtime call: initializing the block will be one, applying an extrinsic will be another.

This means that FRAME or other runtime code cannot assume that the memory is persistent between the calls.

There is a number of reasons why this may be desirable:

  1. in my experience working with contracts, there were several times when I wish the memory was preserved between calls.
  2. Get rid of junk in storage proofs. #9170 which is another case, which probably solved via other means but still.
  3. passing data between on_initialize and on_finalize without touching storage. It happens fairly often when we need to do something in on_finalize. For example, remove N elements from a list that satisfy some condition. However, using on_finalize requires on_initialize to return weight to-be-consumed by on_finalize, implying that on_initialize will need to see how many items satisfy the predicate thus getting N. However, even though on_initialize did the work, on_finalize will need to run the same code to decide which elements to prune. It would be good if on_initialize could just communicate to on_finalize what items it needs to remove.
  4. this also ties back to the ideas of adding the ephemeral storage into Substrate Runtime as a some sort of host support via a custom child trie. It seems that storing the values in memory can help solving this issue without introducing an ephemeral in a more elegant way. Not sure if all use-cases can be covered by it though.

This also ties back to the issue of wasm instance spawning overhead. During the recent work on #10244 we found out that right now an instance call overhead is at least ≈50µs. If we take our target of 1000 tps for polkadot we get 12k tx per parablock, thus in cumulus it would take 600ms only on wasm instance spawning overhead, a good chunk of time by any means. While this is not critical and possibly we will get bottlenecked somewhere else, still something to keep in mind.

One approach to tackle this is to simply keep an instance between the runtime API calls, instead of creating a new one each time.

Similar issue was discussed between me and @rphmeier where we came up with an idea where instead of external iteration we could use internal. That is, it's when the runtime controls the block filling. As a strawman: the block building interface would look like a single call into the runtime. That call will do the initialization of the block, then fetching the extrinsics and applying them and then finalization. The fetching part is the most interesting here: the runtime calls a specific non-deterministic function which returns the next transaction. That probably has an entire can of tradeoffs that should be thought through, like how do we handle timeouts and so on.

One problem that prevents us from making the memory persistent between the stages of block execution is that some extrinsics can panic. Although this is not normal, this can potentially happen. When that happens, we want to make sure that the already possible DoS vector is not amplified by the implementation details. Giving a runtime call a new wasm instance is handy because it can destroy it and the block authorship module could just move on. If we were to preserve the memory between calls we need to figure how to recover from such a situation quickly.

This now brings us to the classic ideas like paritytech/polkadot-sdk#370. With integration of wasmtime we figured out that we probably should employ mmap/CoW techniques. In contemporary history, we are thinking about resurrecting the CoW approach to drive down the wasm spawning latency. Perhaps, the very same mechanism may allow us to implement the last part of paritytech/polkadot-sdk#370 effeciently?

@paritytech paritytech deleted a comment from MrishoLukamba Dec 25, 2021
@shawntabrizi
Copy link
Member

is that some extrinsics can panic

It would seem to me that a panic in the runtime is already a "game breaking" problem to have in the code, and thus we should not design a less efficient system in the case that we can better handle an already broken code path.

If saving a single wasm instance will provide a significant performance increase, I think it makes perfect sense.

@bkchr
Copy link
Member

bkchr commented Dec 29, 2021

You can not proof that there is never a panic anywhere. So, we need to take care of this.

@shawntabrizi
Copy link
Member

We already assume a panic in the runtime is a DDOS vulnerability to a chain. Does it actually matter then the scale of the DDOS?

@bkchr
Copy link
Member

bkchr commented Dec 30, 2021

Here the problem could be that you stop block production on all validators. This is really bad! Currently, when you have one failing tx, nothing will happen. But, if we can not safely role back, it means the entire block needs to be thrown away. Yeah, we could restart and skip transactions, but this needs to be really thought through :P

A panic could for example also happen because some storage entry can not be decoded anymore. There are tons of reasons why the runtime can panic. I know that we try to prevent it, but we can not proof this.

@kianenigma
Copy link
Contributor

The fetching part is the most interesting here: the runtime calls a specific non-deterministic function which returns the next transaction. That probably has an entire can of tradeoffs that should be thought through, like how do we handle timeouts and so on.

The client can heuristically pass a package of transactions into the runtime in the single API call as well. For example, if the client is almost sure that all of the transactions in the pool can fit in the next block.

One problem that prevents us from making the memory persistent between the stages of block execution is that some extrinsics can panic

Is this the main reason for the current design? Also, isn't there any way for wasm to handle panics internally, i.e. something like catch_unwind in no_std?

@pepyakin
Copy link
Contributor Author

pepyakin commented Oct 5, 2022

I did not participate in designing this, so I am not sure about the original intention, but yes using separate instances in the block builders definitely solves this.

And yes, the present-day wasm cannot really handle unwinding correctly, even with std. While it's possible to provide the host functions to create try-catch blocks, an attempt at unwinding will lead to at least memory leaks and at worst memory corruption.

@bkchr
Copy link
Member

bkchr commented Oct 5, 2022

Proper try catch will also require this for Parachains: #10222

@burdges
Copy link

burdges commented Aug 9, 2023

At a high level, we should not expect verifiers to be the same code as provers. We all know this for crypto and storage of course, but it extends into the block logic too. In particular, your verifier looks radically different from your builder anytime your block logic involves some NP-hard problem like an integer program, or a space flavor like NL-hard problems, or not even slow but simply faster if given some witness.

We'll need real persistent memory for batch verification, from which most crypto benefits but which becomes essential for many zk proof systems. In these, you've many transactions each with their own proofs provided by users, but the block builder performs some computation for merging these proofs. It's morally like preparing the PoV but should happen within a single chain.

A batch verifier would first collect curve points or G_T elements from each transaction or inherent, and then run some batch proof checker on all of them plus some additional inherent data. Importantly, we do not serialize these collected curve points or G_T elements because doing so securely is slow and upstream maintainers would fight you tooth & nail to never expose faster insecure serialization.

Also anti-MEV measures could exploit batching so transactions from one block cannot be placed into a different block, without seeing the original transaction anyways.

Anyways..

We should definitely do in-memory storage, but ideally we should've some story for when parachain teams logic really differs between block builders and verfiers, like say a game which approximately solves an integer program at each iteration. It's possible that story becomes frameless runtimes like tuxedo, but a less radical solution would be replacing the execute_block of frame with something that calls different instantiations of the tx code, etc. Independently, there might be other reasons to have seperate runtime builds for block builders and verifiers.

It's likely cosmos is way ahead of us here, even assuming their core SDK team made similar design choices, penumbra would've forked this sort of functionality into their ecosystem by now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants