Enable the bulk memory operations WASM feature #36

koute · 2022-09-08T13:06:30Z

(This is a subissue of paritytech/substrate#10707; I'm creating a new issue to focus on just the bulk memory operations.)

We should seriously consider enabling the bulk memory operations feature in our WASM executor and our runtimes. We have recently discovered that the contracts' benchmarks currently can spend up to 75% of their time within the WASM calling memset; not only that, since wasmtime doesn't cache-align loops depending on how the instructions are laid out in memory the performance of memset/memcpy/etc. can very widely vary, and can become up to ~40% slower when compared to the cache-aligned case.

I've done a quick test to compare the performance of the pallet_contracts/seal_return_per_kb benchmark as it currently stands today; here's its execution time in each case:

without bulk memory operations, loop not cache-aligned: 362598
without bulk memory operations, loop cache-aligned: 230293
with bulk memory operations: 113970

Not only does it cut down the benchmark's execution time by half, but also should prevent wasmtime's codegen roulette from regressing the performance.

Now, we could probably just optimize this on wasmi's side so that it doesn't preallocate and clear 1MB buffer on each invocation (assuming this hasn't been done already; we're still using quite an old version of wasmi, and I haven't checked the newest version). Nevertheless I think this has demonstrated that there's concrete value in enabling this extension - even if wasmi is fixed something else could conceivably allocate large buffers and tank the performance, and then we'll be back to square one; there's also potential for this extension to speed things up in general, considering how widely memset/memcpy are used under the hood. I don't think it's worth holding back on this extension anymore, and we should just pull the trigger and enable it. In the worst case it won't make any difference, in the best case it can significantly speed things up.

What needs to be done? (high level plan)

Add support for the bulk memory ops to wasmi, wasmi-validation and wasm-instrument, if it hasn't been done yet.
Enable the bulk feature on the parity-wasm.
Call config.wasm_bulk_memory(true) when initializing wasmtime.
Do a burn in with the runtime compiled with -C target-feature=+bulk-memory.
Release a new version of polkadot.
Wait for a few releases. (Essentially the same as when introducing a new host function.)
Permanently enable -C target-feature=+bulk-memory flag when building runtimes.
???
Profit.

Anything else I'm missing?

cc @pepyakin @athei @Robbepop

The text was updated successfully, but these errors were encountered:

Robbepop · 2022-09-08T13:28:30Z

Now, we could probably just optimize this on wasmi's side so that it doesn't preallocate and clear 1MB buffer on each invocation (assuming this hasn't been done already; we're still using quite an old version of wasmi, and I haven't checked the newest version).

The newer wasmi versions allow to set the initial and maximum value stack length in the Config similar to how Wasmtime Config works. Therefore this wasmi specific problem should be resolved once we upgrade to wasmi 0.16.0 or above.

Add support for the bulk memory ops to wasmi, wasmi-validation and wasm-instrument, if it hasn't been done yet.

We probably do not need support for Wasm bulk-memory proposal in wasmi since this is only relevant if we want to keep wasmi as a Substrate runtime execution engine but I honestly see no reason why we should keep it. In the very past it was useful when Wasmtime was not as stable as it is nowadays (or maybe some other reason). Although support for Wasm bulk-memory proposal is planned for wasmi since certain smart contracts could potentially benefit from it.

pepyakin · 2022-09-08T13:28:35Z

Citing the parent issue:

It would be great to have data on how exactly performance is improved. This would help to evaluate how much priority it has for implementation and for the runtime writers user for upgrading.

wasmi does not support pretty much most of the newest features. wasmtime is the primary engine for now, but it would still be good to have the second engine.

The polkadot side should be taken into account. How would PVF execution migrated? Or those features won't hit PVF?

It's clear about the pt. 1. Re pt. 2, I think either way is fine. (UPD: Robin beat me up to it, and I agree, I think getting rid of wasmi executor is on the table)

The PVF is a bit more complicated.

We need to consider that if we unconditionally enable wasm_bulk_memory it will be enabled for PVFs. There are two problems with that:

There is a problem with the upgrade. If we just YOLO enable it, an adversary could take advantage of that. Unfortunately, until #917 is landed our hands are tied.

That implies that the executor configuration should allow disabling bulk mem ops. It will be disabled for PVF execution.

Since the Cumulus PDK uses the same binary for the Runtime and PVF, the parachains won't be able to take advantage of bulk mem ops. Parachains are the overwhelming majority of the users of Substrate, so the impact would be limited until we upgrade PVF.

Then, the blocker for enabling bulk mem ops for PVFs is the question about the metering. It is likely coming at least for PVFs. That means we have to squash this concern touched on in the parent issue.

athei · 2022-09-08T14:58:38Z

(UPD: Robin beat me up to it, and I agree, I think getting rid of wasmi executor is on the table)

For the foreseeable future we will run contracts with an in-runtime wasmi.

pepyakin · 2022-09-08T15:08:03Z

Tangential? The wasmi executor refers to sc-executor-wasmi and not the sandbox backend.

ggwpez · 2022-09-08T16:33:17Z

Should be sanity checked with something outside of pallet benchmarks, but the seal_return_per_kb sound great.
Probably historic import times, there is benchmark block to measure re-import times of old blocks.
I will do so once you put up an MR 😄

Design the `archive` API

* ci: pin to specific nightly for docs job * .

koute added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Sep 8, 2022

pepyakin mentioned this issue Sep 8, 2022

wasm: enable new features paritytech/substrate#10707

Open

6 tasks

koute mentioned this issue Aug 24, 2023

Consider patching memset/memcpy inside of WASM to call the native implementations instead #21

Open

juangirini added the T1-runtime label Jun 7, 2023

the-right-joyce transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed T1-runtime labels Aug 25, 2023

Robbepop mentioned this issue Feb 2, 2024

Big performance regression between 0.31 and 0.32 wasmi-labs/wasmi#914

Closed

lexnv pushed a commit that referenced this issue Apr 3, 2024

Merge pull request #36 from tomaka/archive

9cceac5

Design the `archive` API

liuchengxu added a commit to subcoin-project/polkadot-sdk that referenced this issue Sep 20, 2024

ci: pin to specific nightly for docs job (paritytech#36)

5531c21

* ci: pin to specific nightly for docs job * .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the bulk memory operations WASM feature #36

Enable the bulk memory operations WASM feature #36

koute commented Sep 8, 2022

Robbepop commented Sep 8, 2022 •

edited

Loading

pepyakin commented Sep 8, 2022 •

edited

Loading

athei commented Sep 8, 2022

pepyakin commented Sep 8, 2022

ggwpez commented Sep 8, 2022

Enable the bulk memory operations WASM feature #36

Enable the bulk memory operations WASM feature #36

Comments

koute commented Sep 8, 2022

What needs to be done? (high level plan)

Robbepop commented Sep 8, 2022 • edited Loading

pepyakin commented Sep 8, 2022 • edited Loading

athei commented Sep 8, 2022

pepyakin commented Sep 8, 2022

ggwpez commented Sep 8, 2022

Robbepop commented Sep 8, 2022 •

edited

Loading

pepyakin commented Sep 8, 2022 •

edited

Loading