Remove no-op reinterpret `wasmi` instructions #518

Robbepop · 2022-10-15T08:42:10Z

This removes the 4 reinterpret wasmi bytecode instructions since they are doing nothing at runtime anyways.
Turns out that for some weird reason removing those 4 instructions significantly regresses performance of the wasmi interpreter ...
Until this is fixed or we know why that happens I won't merge this PR.

They did nothing before this change so we can also just remove them. This is due to the fact that wasmi bytecode is untyped.

paritytech-cicd-pr · 2022-10-15T08:56:24Z

BENCHMARKS

	NATIVE			WASMTIME
BENCHMARK	MASTER	PR	DIFF	MASTER	PR	DIFF	WASMTIME OVERHEAD
`execute/` `bare_call_0`	1.23ms	1.20ms	🟢 -1.98%	902.71µs	908.39µs	🔴 0.65%	🟢 -25%
`execute/` `bare_call_0/typed`	652.44µs	651.51µs	⚪ 0.02%	389.28µs	408.21µs	🔴 4.85%	🟢 -37%
`execute/` `bare_call_1`	1.33ms	1.32ms	🟢 -0.73%	1.11ms	1.22ms	🔴 10.07%	🟢 -8%
`execute/` `bare_call_16`	3.07ms	2.85ms	🟢 -7.35%	4.60ms	4.64ms	⚪ 0.07%	🟡 63%
`execute/` `bare_call_16/typed`	1.53ms	1.36ms	🟢 -11.26%	2.00ms	2.02ms	⚪ 0.68%	🟢 48%
`execute/` `bare_call_1/typed`	776.18µs	764.30µs	🟢 -1.43%	699.57µs	697.45µs	⚪ -0.45%	🟢 -9%
`execute/` `bare_call_4`	1.65ms	1.64ms	🟢 -0.70%	1.72ms	1.75ms	🔴 1.56%	🟢 7%
`execute/` `bare_call_4/typed`	876.78µs	857.93µs	🟢 -2.09%	824.43µs	802.98µs	🟢 -2.70%	🟢 -6%
`execute/` `br_table`	830.81µs	890.46µs	⚪ 5.23%	878.53µs	920.41µs	🔴 4.49%	🟢 3%
`execute/` `count_until`	1.15ms	772.43µs	🟢 -33.07%	1.98ms	2.37ms	🔴 19.67%	🔴 207%
`execute/` `factorial_iterative`	500.72µs	364.59µs	🟢 -27.17%	801.11µs	842.85µs	🔴 5.38%	🔴 131%
`execute/` `factorial_recursive`	761.11µs	700.12µs	🟢 -8.19%	1.23ms	1.27ms	🔴 3.64%	🟡 82%
`execute/` `fib_iterative`	2.54ms	1.67ms	🟢 -34.17%	4.33ms	4.48ms	🔴 3.69%	🔴 168%
`execute/` `fib_recursive`	7.98ms	6.64ms	🟢 -16.87%	11.19ms	11.29ms	⚪ 0.86%	🟡 70%
`execute/` `global_bump`	2.00ms	1.17ms	🟢 -41.59%	3.00ms	3.17ms	🔴 5.57%	🔴 171%
`execute/` `global_const`	1.21ms	898.46µs	🟢 -26.18%	2.21ms	2.33ms	🔴 5.80%	🔴 159%
`execute/` `host_calls`	32.42µs	31.52µs	🟢 -2.72%	35.79µs	37.24µs	🔴 4.00%	🟢 18%
`execute/` `memory_fill`	2.04ms	1.50ms	🟢 -26.73%	3.74ms	3.99ms	🔴 6.75%	🔴 166%
`execute/` `memory_sum`	2.02ms	1.50ms	🟢 -25.87%	3.80ms	3.97ms	🔴 4.58%	🔴 165%
`execute/` `memory_vec_add`	4.42ms	3.16ms	🟢 -28.56%	7.72ms	8.22ms	🔴 6.48%	🔴 161%
`execute/` `recursive_is_even`	1.33ms	1.28ms	🟢 -4.21%	2.02ms	2.10ms	🔴 4.08%	🟡 64%
`execute/` `recursive_ok`	171.82µs	165.80µs	🟢 -3.33%	282.62µs	295.25µs	🔴 4.46%	🟡 78%
`execute/` `recursive_scan`	219.00µs	200.94µs	🟢 -8.34%	359.73µs	375.06µs	🔴 4.10%	🟡 87%
`execute/` `recursive_trap`	16.95µs	16.19µs	🟢 -4.29%	28.84µs	28.87µs	⚪ 0.13%	🟡 78%
`execute/` `regex_redux`	815.50µs	660.73µs	🟢 -18.75%	1.40ms	1.45ms	🔴 3.17%	🔴 119%
`execute/` `rev_complement`	890.52µs	620.62µs	🟢 -30.32%	1.39ms	1.43ms	🔴 2.87%	🔴 130%
`execute/` `tiny_keccak`	784.55µs	470.76µs	🟢 -39.94%	1.09ms	1.19ms	🔴 9.96%	🔴 153%
`execute/` `trunc_f2i`	1.33ms	1.09ms	🟢 -17.77%	2.22ms	2.44ms	🔴 9.77%	🔴 123%
`instantiate/` `wasm_kernel`	70.63µs	70.61µs	⚪ 1.16%	71.43µs	95.52µs	🔴 32.71%	🟢 35%
`translate/` `erc1155`	239.75µs	246.43µs	🔴 2.37%	373.63µs	380.49µs	🔴 1.99%	🟡 54%
`translate/` `erc20`	117.87µs	119.29µs	⚪ 1.45%	181.39µs	185.79µs	🔴 2.23%	🟡 56%
`translate/` `erc721`	167.93µs	169.41µs	🔴 1.44%	264.59µs	270.11µs	🔴 1.97%	🟡 59%
`translate/` `spidermonkey`	0.00ns	0.00ns	⚪ 0.53%	0.00ns	0.00ns	🔴 1.99%	🟢 0%
`translate/` `wasm_kernel`	4.26ms	4.19ms	🟢 -1.92%	7.07ms	7.19ms	🔴 2.16%	🟡 72%

Link to pipeline

Robbepop · 2022-10-15T09:44:31Z

I used the cargo-asm tool to inspect the Executor::execute function on an assembly level to understand if Rust and LLVM somehow fail to properly optimize the function after removal of the 4 unnecessary instructions. The result is: There is a huge difference in the assembly output!

See yourself:

master: https://gist.github.com/Robbepop/cde5a25f00b78259a11170a6614aca90
- Roughly 3.8k lines of assembly
PR: https://gist.github.com/Robbepop/9df0eb661c5fcab9aa8391221fda7196
- Roughly 5.9k lines of assembly

…-refactor

codecov-commenter · 2022-10-23T10:18:21Z

Codecov Report

Merging #518 (4b3d74d) into master (f5e1c46) will decrease coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #518      +/-   ##
==========================================
- Coverage   79.73%   79.65%   -0.08%     
==========================================
  Files          75       75              
  Lines        6291     6267      -24     
==========================================
- Hits         5016     4992      -24     
  Misses       1275     1275

Impacted Files	Coverage Δ
crates/wasmi/src/engine/bytecode/mod.rs	`100.00% <ø> (ø)`
crates/wasmi/src/engine/executor.rs	`98.04% <ø> (-0.05%)`	⬇️
crates/wasmi/src/engine/bytecode/tests.rs	`100.00% <100.00%> (ø)`
crates/wasmi/src/engine/func_builder/mod.rs	`89.72% <100.00%> (-0.13%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…to rf-bytecode-refactor

Mostly to trigger CI benchmarks ...

Robbepop · 2022-11-04T10:09:39Z

While the changes introduces in this PR were a regression to performance for wasmi with Rust 1.64 they are now significantly improving wasmi performance with Rust 1.65. Therefore it is now ready to merge.

Robbepop added 3 commits October 15, 2022 10:26

cleanup bytecode test

1819da3

remove unnecessary wasmi reinterpret instructions

ffc1be8

They did nothing before this change so we can also just remove them. This is due to the fact that wasmi bytecode is untyped.

move ReturnIfNez next to Return instruction

aab18f0

Robbepop mentioned this pull request Oct 15, 2022

Massive performance regression between nightly-2022-08-12 and nightly-2022-08-13 rust-lang/rust#102952

Open

Robbepop added 3 commits October 20, 2022 10:24

Merge branch 'master' of github.com:paritytech/wasmi into rf-bytecode…

efd0e2c

…-refactor

Merge branch 'master' into rf-bytecode-refactor

90df014

make crate compile again after merge

b47fd5d

Robbepop added 5 commits October 23, 2022 15:22

reorder match arms in executor

a7ec54c

Merge branch 'master' into rf-bytecode-refactor

2ffc403

revert moving ReturnIfNez variant

c13a30a

Merge branch 'rf-bytecode-refactor' of github.com:paritytech/wasmi in…

4797a67

…to rf-bytecode-refactor

Merge branch 'master' into rf-bytecode-refactor

9fa48fd

Robbepop changed the title ~~Refactor wasmi bytecode~~ Remove no-op reinterpret wasmi instructions Oct 23, 2022

Robbepop added 7 commits October 26, 2022 11:24

Merge branch 'master' into rf-bytecode-refactor

d1187c3

Merge branch 'master' into rf-bytecode-refactor

4b3d74d

Merge branch 'master' into rf-bytecode-refactor

2956d94

Merge branch 'master' into rf-bytecode-refactor

aa9b93f

Merge branch 'master' into rf-bytecode-refactor

e708e53

add docs

0a14f6f

Mostly to trigger CI benchmarks ...

Merge branch 'master' into rf-bytecode-refactor

3c9c924

Robbepop merged commit 9e9cbab into master Nov 4, 2022

Robbepop deleted the rf-bytecode-refactor branch November 4, 2022 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove no-op reinterpret `wasmi` instructions #518

Remove no-op reinterpret `wasmi` instructions #518

Robbepop commented Oct 15, 2022 •

edited

Loading

paritytech-cicd-pr commented Oct 15, 2022 •

edited

Loading

Robbepop commented Oct 15, 2022

codecov-commenter commented Oct 23, 2022 •

edited

Loading

Robbepop commented Nov 4, 2022

Remove no-op reinterpret wasmi instructions #518

Remove no-op reinterpret wasmi instructions #518

Conversation

Robbepop commented Oct 15, 2022 • edited Loading

paritytech-cicd-pr commented Oct 15, 2022 • edited Loading

BENCHMARKS

Robbepop commented Oct 15, 2022

codecov-commenter commented Oct 23, 2022 • edited Loading

Codecov Report

Robbepop commented Nov 4, 2022

Remove no-op reinterpret `wasmi` instructions #518

Remove no-op reinterpret `wasmi` instructions #518

Robbepop commented Oct 15, 2022 •

edited

Loading

paritytech-cicd-pr commented Oct 15, 2022 •

edited

Loading

codecov-commenter commented Oct 23, 2022 •

edited

Loading