Implement lazy Wasm to `wasmi` bytecode translation #844

Robbepop · 2023-12-09T13:58:31Z

Closes #732.
Closes #516.

Benchmarks

Execution

Local benchmarks so far concluded that call intense work load have 10-15% regressed performance when eagerly compiling functions. This is bad since ideally we would only want to suffer performance penalties when lazily compiling functions.
It might be possible to improve this situation with performance improvements in how CodeMap returns compiled functions.
Furthermore call intense workloads are usually pretty rare in real workloads. Compute intense workloads are not significantly affected by changes introduced with lazy compilation.

Translation

After getting lazy translation up and running we were able to gather some promising benchmarks:

Spidermonkey

lazy+validated speed-up over eager+validated: 2.1x
lazy+unchecked speed-up over eager+unchecked: 8.4x
lazy+unchecked speed-up over eager+validated: 10.2x

translate/spidermonkey/checked/eager/default
    time:   [46.189 ms 46.407 ms 46.621 ms]
translate/spidermonkey/checked/eager/fuel
    time:   [47.625 ms 47.845 ms 47.989 ms]
translate/spidermonkey/checked/lazy/default
    time:   [21.786 ms 21.822 ms 21.896 ms]
translate/spidermonkey/unchecked/eager/default
    time:   [37.774 ms 38.120 ms 38.444 ms]
translate/spidermonkey/unchecked/eager/fuel
    time:   [39.838 ms 39.888 ms 39.955 ms]
translate/spidermonkey/unchecked/lazy/default
    time:   [4.5235 ms 4.5328 ms 4.5440 ms]

ERC-20

lazy+validated speed-up over eager+validated: 1.8x
lazy+unchecked speed-up over eager+unchecked: 2.7x
lazy+unchecked speed-up over eager+validated: 3.2x

translate/erc20/checked/eager/default
    time:   [76.379 µs 77.450 µs 79.144 µs]
translate/erc20/checked/eager/fuel
    time:   [81.388 µs 81.930 µs 82.994 µs]
translate/erc20/checked/lazy/default
    time:   [42.184 µs 42.383 µs 42.683 µs]
translate/erc20/unchecked/eager/default
    time:   [65.050 µs 65.228 µs 65.471 µs]
translate/erc20/unchecked/eager/fuel
    time:   [68.156 µs 68.341 µs 68.619 µs]
translate/erc20/unchecked/lazy/default
    time:   [24.212 µs 24.328 µs 24.413 µs]

ERC-1155

lazy+validated speed-up over eager+validated: 1.95x
lazy+unchecked speed-up over eager+unchecked: 4.0x
lazy+unchecked speed-up over eager+validated: 4.6x

translate/erc1155/checked/eager/default
    time:   [157.34 µs 157.81 µs 158.31 µs]
translate/erc1155/checked/eager/fuel
    time:   [169.16 µs 169.41 µs 169.66 µs]
translate/erc1155/checked/lazy/default
    time:   [79.642 µs 80.586 µs 81.850 µs]
translate/erc1155/unchecked/eager/default
    time:   [135.57 µs 136.02 µs 136.49 µs]
translate/erc1155/unchecked/eager/fuel
    time:   [146.67 µs 147.04 µs 147.25 µs]
translate/erc1155/unchecked/lazy/default
    time:   [33.641 µs 34.011 µs 34.278 µs]

Translation Benchmarks: Conclusion

We see roughly 2x speed-up with lazy mode compared to eager when validating the input.
We see roughly 3-8x speed-up with lazy mode compared to eager without validating the input.
We see roughly 3.5-10x speed-up with unchecked lazy mode compared to validated eager mode.

TODOs

Add CompilationMode to Config
Refactor ModuleBuilder to remove unnecessary lifetime annotations.
Add lazy Wasm function translation during Wasm module parsing.
Add lazy Wasm function translation when calling a lazily compiled Wasm function the first time.
Return wasmi::Error from the Wasmi instruction executor instead of TrapCode.
- This is required because now call instructions may fail with TranslationError.
Fix performance penalties for call intense workloads if possible.
Make it technically impossible to race in CodeMap::get.
- Currently a RwLock is used which is an unfair lock towards writers. However, when lazily compiling a function we need write access. It is possible to fix this by introducing another state that is going to be queried by threads waiting for the function to be compiled.

This commit removes all lifetime annotations from parsing related types. This is going to be important since we require the new ModuleHeader type to be stored in the Engine for all lazily compiled Wasm functions for translation purposes.

paritytech-cicd-pr · 2023-12-09T14:18:22Z

BENCHMARKS

	NATIVE			WASMTIME
BENCHMARK	MASTER	PR	DIFF	MASTER	PR	DIFF	WASMTIME OVERHEAD
`execute/` `br_table`	1.51ms	1.45ms	🟢 -3.87%	1.33ms	1.27ms	🟢 -4.49%	🟢 -13%
`execute/` `call/host/1`	45.50µs	53.98µs	🔴 18.64%	63.18µs	67.95µs	🔴 7.55%	🟢 26%
`execute/` `call/rec`	166.01µs	181.91µs	🔴 9.58%	343.20µs	368.13µs	🔴 7.26%	🔴 102%
`execute/` `count_until`	7.48ms	6.53ms	🟢 -12.69%	7.48ms	7.48ms	⚪ -0.02%	🟢 15%
`execute/` `divrem`	6.27ms	6.22ms	⚪ -0.78%	6.98ms	8.43ms	🔴 20.85%	🟢 36%
`execute/` `factorial/iter`	259.75µs	264.79µs	🔴 1.94%	312.39µs	333.98µs	🔴 6.91%	🟢 26%
`execute/` `factorial/rec`	949.59µs	767.60µs	🟢 -19.16%	1.24ms	1.36ms	🔴 9.73%	🟡 77%
`execute/` `fibonacci/iter`	1.29ms	1.36ms	🔴 5.75%	1.28ms	1.27ms	⚪ -0.63%	🟢 -7%
`execute/` `fibonacci/rec`	6.03ms	6.55ms	🔴 8.70%	12.70ms	13.85ms	🔴 9.03%	🔴 111%
`execute/` `fibonacci/tail`	1.70ms	1.44ms	🟢 -15.24%	3.89ms	3.78ms	🟢 -2.86%	🔴 161%
`execute/` `fuse`	7.72ms	7.19ms	🟢 -6.92%	11.60ms	12.20ms	🔴 5.15%	🟡 70%
`execute/` `global/bump`	1.32ms	1.32ms	⚪ 0.13%	1.55ms	1.62ms	🔴 4.64%	🟢 23%
`execute/` `global/get_const`	733.61µs	687.96µs	🟢 -6.22%	747.12µs	750.70µs	⚪ 0.48%	🟢 9%
`execute/` `is_even/rec`	1.06ms	1.18ms	🔴 10.88%	2.17ms	2.36ms	🔴 8.63%	🟡 100%
`execute/` `memory/fill_bytes`	1.09ms	1.12ms	🔴 2.66%	1.41ms	1.34ms	🟢 -4.94%	🟢 20%
`execute/` `memory/sum_bytes`	1.04ms	1.15ms	🔴 11.06%	1.23ms	1.32ms	🔴 7.14%	🟢 14%
`execute/` `memory/vec_add`	2.96ms	2.95ms	⚪ -0.57%	3.59ms	3.89ms	🔴 8.34%	🟢 32%
`execute/` `recursive_scan`	188.56µs	202.18µs	🔴 7.22%	376.58µs	396.16µs	🔴 5.20%	🟡 96%
`execute/` `recursive_trap`	15.31µs	17.33µs	🔴 13.16%	34.22µs	37.21µs	🔴 8.74%	🔴 115%
`execute/` `regex_redux`	591.39µs	582.21µs	🟢 -1.55%	1.10ms	1.06ms	🟢 -3.49%	🟡 82%
`execute/` `rev_complement`	443.17µs	456.22µs	🔴 2.94%	674.50µs	668.24µs	⚪ -0.93%	🟢 46%
`execute/` `tiny_keccak`	347.50µs	352.09µs	🔴 1.32%	386.87µs	384.19µs	⚪ -0.69%	🟢 9%
`execute/` `trunc_f2i`	613.22µs	613.79µs	⚪ 0.09%	963.86µs	1.01ms	🔴 4.44%	🟡 64%
`instantiate/` `wasm_kernel`	56.13µs	54.05µs	🟢 -3.71%	56.97µs	52.88µs	🟢 -7.16%	🟢 -2%
`overhead/` `call/typed/0`	1.19ms	1.26ms	🔴 5.79%	754.30µs	874.08µs	🔴 15.88%	🟢 -31%
`overhead/` `call/typed/16`	1.61ms	1.67ms	🔴 3.21%	2.09ms	1.98ms	🟢 -5.10%	🟢 19%
`overhead/` `call/untyped/0`	1.62ms	1.57ms	🟢 -2.89%	1.26ms	1.20ms	🟢 -4.63%	🟢 -24%
`overhead/` `call/untyped/16`	2.48ms	2.45ms	⚪ -1.34%	4.05ms	3.77ms	🟢 -7.12%	🟡 54%
`translate/` `bz2/checked/eager/default`	1.36ms	1.32ms	🟢 -3.11%	2.36ms	2.39ms	⚪ 1.33%	🟡 81%
`translate/` `bz2/checked/eager/fuel`	1.47ms	1.43ms	🟢 -3.12%	2.56ms	2.62ms	🔴 2.14%	🟡 83%
`translate/` `bz2/checked/lazy/default`	1.37ms	547.76µs	🟢 -59.98%	2.37ms	981.71µs	🟢 -58.59%	🟡 79%
`translate/` `bz2/unchecked/eager/default`	1.10ms	1.07ms	🟢 -2.96%	1.76ms	1.83ms	🔴 3.81%	🟡 71%
`translate/` `bz2/unchecked/eager/fuel`	1.19ms	1.17ms	🟢 -2.09%	1.95ms	2.04ms	🔴 4.41%	🟡 74%
`translate/` `bz2/unchecked/lazy/default`	1.10ms	35.32µs	🟢 -96.79%	1.77ms	45.53µs	🟢 -97.42%	🟢 29%
`translate/` `erc1155/checked/eager/default`	281.89µs	276.76µs	🟢 -1.82%	475.53µs	469.57µs	⚪ -1.25%	🟡 70%
`translate/` `erc1155/checked/eager/fuel`	302.31µs	296.16µs	🟢 -2.04%	505.29µs	509.78µs	⚪ 0.89%	🟡 72%
`translate/` `erc1155/checked/lazy/default`	282.13µs	128.98µs	🟢 -54.28%	474.11µs	213.61µs	🟢 -54.95%	🟡 66%
`translate/` `erc1155/unchecked/eager/default`	233.32µs	228.82µs	🟢 -1.93%	361.94µs	364.84µs	⚪ 0.80%	🟡 59%
`translate/` `erc1155/unchecked/eager/fuel`	251.51µs	245.76µs	🟢 -2.29%	386.27µs	395.53µs	🔴 2.40%	🟡 61%
`translate/` `erc1155/unchecked/lazy/default`	232.73µs	24.78µs	🟢 -89.35%	359.69µs	31.76µs	🟢 -91.17%	🟢 28%
`translate/` `erc20/checked/eager/default`	135.25µs	135.12µs	⚪ -0.10%	228.16µs	226.71µs	⚪ -0.64%	🟡 68%
`translate/` `erc20/checked/eager/fuel`	143.89µs	142.15µs	🟢 -1.21%	239.47µs	240.59µs	⚪ 0.47%	🟡 69%
`translate/` `erc20/checked/lazy/default`	135.67µs	65.85µs	🟢 -51.46%	229.13µs	108.05µs	🟢 -52.84%	🟡 64%
`translate/` `erc20/unchecked/eager/default`	112.54µs	110.64µs	🟢 -1.69%	174.40µs	174.60µs	⚪ 0.11%	🟡 58%
`translate/` `erc20/unchecked/eager/fuel`	119.70µs	118.15µs	🟢 -1.30%	183.48µs	186.79µs	🔴 1.80%	🟡 58%
`translate/` `erc20/unchecked/lazy/default`	112.42µs	19.09µs	🟢 -83.02%	175.05µs	24.22µs	🟢 -86.17%	🟢 27%
`translate/` `erc721/checked/eager/default`	194.19µs	191.41µs	🟢 -1.43%	328.74µs	327.90µs	⚪ -0.26%	🟡 71%
`translate/` `erc721/checked/eager/fuel`	204.90µs	201.18µs	🟢 -1.81%	344.00µs	346.66µs	⚪ 0.77%	🟡 72%
`translate/` `erc721/checked/lazy/default`	194.47µs	92.37µs	🟢 -52.50%	328.22µs	153.96µs	🟢 -53.09%	🟡 67%
`translate/` `erc721/unchecked/eager/default`	158.23µs	154.31µs	🟢 -2.48%	247.99µs	248.70µs	⚪ 0.29%	🟡 61%
`translate/` `erc721/unchecked/eager/fuel`	166.55µs	164.02µs	🟢 -1.52%	267.10µs	266.02µs	⚪ -0.40%	🟡 62%
`translate/` `erc721/unchecked/lazy/default`	158.25µs	21.75µs	🟢 -86.25%	249.39µs	27.87µs	🟢 -88.83%	🟢 28%
`translate/` `pulldown_cmark/checked/eager/default`	3.62ms	3.58ms	⚪ -1.15%	6.03ms	6.15ms	🔴 1.97%	🟡 72%
`translate/` `pulldown_cmark/checked/eager/fuel`	3.90ms	3.87ms	⚪ -0.69%	6.49ms	6.80ms	🔴 4.76%	🟡 76%
`translate/` `pulldown_cmark/checked/lazy/default`	3.62ms	1.55ms	🟢 -57.25%	6.10ms	2.60ms	🟢 -57.34%	🟡 68%
`translate/` `pulldown_cmark/unchecked/eager/default`	3.03ms	2.99ms	🟢 -1.40%	4.66ms	4.76ms	🔴 2.21%	🟡 59%
`translate/` `pulldown_cmark/unchecked/eager/fuel`	3.29ms	3.26ms	⚪ -1.03%	5.07ms	5.26ms	🔴 3.74%	🟡 62%
`translate/` `pulldown_cmark/unchecked/lazy/default`	3.04ms	243.16µs	🟢 -92.00%	4.67ms	244.95µs	🟢 -94.75%	🟢 1%
`translate/` `spidermonkey/checked/eager/default`	76.07ms	75.69ms	⚪ -0.51%	131.98ms	133.34ms	⚪ 1.03%	🟡 76%
`translate/` `spidermonkey/checked/eager/fuel`	82.22ms	82.27ms	⚪ 0.07%	141.53ms	145.19ms	🔴 2.58%	🟡 76%
`translate/` `spidermonkey/checked/lazy/default`	76.09ms	32.64ms	🟢 -57.11%	131.53ms	56.57ms	🟢 -57.00%	🟡 73%
`translate/` `spidermonkey/unchecked/eager/default`	62.93ms	62.34ms	⚪ -0.93%	100.48ms	102.50ms	🔴 2.01%	🟡 64%
`translate/` `spidermonkey/unchecked/eager/fuel`	68.68ms	68.29ms	⚪ -0.56%	110.13ms	112.83ms	🔴 2.45%	🟡 65%
`translate/` `spidermonkey/unchecked/lazy/default`	62.87ms	3.05ms	🟢 -95.14%	100.33ms	3.71ms	🟢 -96.30%	🟢 22%
`translate/` `wasm_kernel/checked/eager/default`	5.04ms	5.04ms	⚪ 0.07%	8.63ms	8.69ms	⚪ 0.70%	🟡 72%
`translate/` `wasm_kernel/checked/eager/fuel`	5.17ms	5.20ms	⚪ 0.61%	9.05ms	9.20ms	🔴 1.58%	🟡 77%
`translate/` `wasm_kernel/checked/lazy/default`	5.02ms	2.42ms	🟢 -51.76%	8.61ms	4.07ms	🟢 -52.74%	🟡 68%
`translate/` `wasm_kernel/unchecked/eager/default`	4.06ms	4.07ms	⚪ 0.27%	6.55ms	6.67ms	🔴 1.88%	🟡 64%
`translate/` `wasm_kernel/unchecked/eager/fuel`	4.20ms	4.23ms	⚪ 0.70%	6.96ms	7.14ms	🔴 2.49%	🟡 69%
`translate/` `wasm_kernel/unchecked/lazy/default`	4.06ms	394.31µs	🟢 -90.29%	6.58ms	472.61µs	🟢 -92.82%	🟢 20%

Link to pipeline

codecov-commenter · 2023-12-09T14:26:36Z

Codecov Report

Attention: 297 lines in your changes are missing coverage. Please review.

Comparison is base (ac00319) 80.90% compared to head (5cec5c5) 81.39%.

Files	Patch %	Lines
crates/wasmi/src/engine/code_map.rs	28.97%	76 Missing ⚠️
crates/wasmi/src/engine/mod.rs	58.58%	41 Missing ⚠️
crates/wasmi/src/error.rs	46.03%	34 Missing ⚠️
crates/wasmi/src/engine/translator/mod.rs	70.09%	32 Missing ⚠️
crates/wasmi/src/module/parser.rs	84.17%	22 Missing ⚠️
crates/wasmi/src/engine/translator/error.rs	0.00%	15 Missing ⚠️
crates/wasmi/src/module/builder.rs	88.31%	9 Missing ⚠️
crates/wasmi/src/module/mod.rs	84.74%	9 Missing ⚠️
crates/wasmi/src/engine/bytecode/utils.rs	45.45%	6 Missing ⚠️
crates/wasmi/src/engine/executor/instrs.rs	16.66%	5 Missing ⚠️
... and 20 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
+ Coverage   80.90%   81.39%   +0.48%     
==========================================
  Files         257      256       -1     
  Lines       22665    22772     +107     
==========================================
+ Hits        18338    18535     +197     
+ Misses       4327     4237      -90

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This fixes a problem in relink_result that CompiledFunc info is oftentimes results.len() is not available at the time is it required due to uninitialized compiled function entities. Using ModuleHeader instead fixes this issue which should improve codegen in these situations and make codegen non-order dependent.

Required in last commit. (oups)

- This divides CompiledFuncEntity for eager translation and UncompiledFuncEntity for lazy translation. - This commit does not yet dispatch on UncompiledFuncEntity during execution of call instructions. - Furthermore this commit does not yet use the new LazyFuncTranslator to actually translate Wasm functions lazily.

This allows us to properly handle failed lazy translations in call instruction executions.

Now wasmi::Error takes over responsibilities of Trap. This make it possible to remove an unnecessary Box indirection.

This makes fast path faster and fixes some problems with unfair write access.

Currently Wasm benchmark CI runs out of memory for spidermonkey lazy unchecked translation. We want to see if there are memory dependencies between the different translation benchmark runs.

This reverts commit 1dd9a1e.

The cycle existed because Engine held ModuleHeader which itself held Engine. The cycle was broken by introducing EngineWeak and make ModuleHeader hold EngineWeak instead of Engine which is just a fancy wrapper around a Weak pointer to an Engine. Therefore Engine access via ModuleHeader now may fail if the Engine does no longer exist. However, due to the fact that ModuleHeader is only accessed via its Engine, this should technically never occure.

Robbepop added 11 commits December 8, 2023 19:17

add CompilationMode to Config

563110e

rename builder::ModuleImports -> ModuleImportsBuilder

b42e821

return reference to GlobalType

708416c

split ModuleBuilder into its header

6943e28

refactor Wasm module parsing

1c65d11

This commit removes all lifetime annotations from parsing related types. This is going to be important since we require the new ModuleHeader type to be stored in the Engine for all lazily compiled Wasm functions for translation purposes.

apply rustfmt

3ecbd1e

remove debug printlns

2444adf

fix intra doc link

3ea9a32

re-export CompilationMode from crate root

ec80bcb

apply rustfmt

fa75e91

silence warning

8b6e095

Robbepop added 17 commits December 10, 2023 12:28

rename FunctionTranslator -> FuncTranslationDriver

ff34584

refactor ArenaIndex impl for CompiledFunc

0414c48

add CompiledFunc -> FuncIdx mapping for ModuleHeader

0d12b6b

apply rustftm

7d96718

add FuncType::len_results

83e08d3

Required in last commit. (oups)

use new as uniform translation driver constructor

c78bc7b

add setup method to the WasmTranslator trait

04b37a8

add LazyFuncTranslator type

dbf8e48

extend Engine[Inner] docs

4b86a69

remove len_results field from CompiledFuncEntity

b2d2ba0

make as_compiled method test-only

474d6a4

make use of InternalFuncEntity::uninit

76f49a7

re-export LazyFuncTranslator from engine module

7f2fa5e

refactor and use new func translators

6a98a03

apply clippy suggestions

72330bb

Robbepop added 27 commits December 14, 2023 23:40

return Error from wasmi instruction executors

0228515

This allows us to properly handle failed lazy translations in call instruction executions.

remove usage of Trap

783b939

Now wasmi::Error takes over responsibilities of Trap. This make it possible to remove an unnecessary Box indirection.

improve CodeMap::get method internals

d043826

This makes fast path faster and fixes some problems with unfair write access.

Merge branch 'master' into rf-lazy-compilation

269ec3b

fix internal doc links

816dfa8

fix no_std build

02c7fee

rename EngineInner::init_func_v2 -> init_func

8e8aa3e

limit ReusableAllocationStack height to just 1

ef84061

experiment: comment out most translation benchmarks

1dd9a1e

Currently Wasm benchmark CI runs out of memory for spidermonkey lazy unchecked translation. We want to see if there are memory dependencies between the different translation benchmark runs.

Revert "experiment: comment out most translation benchmarks"

088aea9

This reverts commit 1dd9a1e.

add forgotten buffer.drain call

ed6cad3

remove commented out code

beb90be

apply wasm-opt -Oz to spidermonkey.wasm (version 116)

27a35d3

improve byte slicing

fab4845

use Self::MAX_INLINE_SIZE constant

421d864

use Self::MAX_INLINE_SIZE in more places

b25b7d0

use Self::MAX_INLINE_SIZE in more places (2)

639c306

increase MAX_INLINE_SIZE in SmallByteSlice to 30

d544417

avoid unnecessary Engine clone

d75832e

remove unnecessary slicing

9ad8c93

apply clippy suggestions

385abda

refactor translation benchmark test runner

d7bcc70

remove direct use of ModuleHeader::engine field

c1729dc

apply rustfmt

b9fb9d3

make Engine::downgrade method crate private

e64e2ef

Merge branch 'master' into rf-lazy-compilation

5cec5c5

Robbepop marked this pull request as ready for review December 16, 2023 20:35

Robbepop merged commit 1b9aae2 into master Dec 16, 2023
21 checks passed

Robbepop deleted the rf-lazy-compilation branch December 16, 2023 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lazy Wasm to `wasmi` bytecode translation #844

Implement lazy Wasm to `wasmi` bytecode translation #844

Robbepop commented Dec 9, 2023 •

edited

Loading

paritytech-cicd-pr commented Dec 9, 2023 •

edited

Loading

codecov-commenter commented Dec 9, 2023 •

edited

Loading

Implement lazy Wasm to wasmi bytecode translation #844

Implement lazy Wasm to wasmi bytecode translation #844

Conversation

Robbepop commented Dec 9, 2023 • edited Loading

Benchmarks

Execution

Translation

Spidermonkey

ERC-20

ERC-1155

Translation Benchmarks: Conclusion

TODOs

paritytech-cicd-pr commented Dec 9, 2023 • edited Loading

BENCHMARKS

codecov-commenter commented Dec 9, 2023 • edited Loading

Codecov Report

Implement lazy Wasm to `wasmi` bytecode translation #844

Implement lazy Wasm to `wasmi` bytecode translation #844

Robbepop commented Dec 9, 2023 •

edited

Loading

paritytech-cicd-pr commented Dec 9, 2023 •

edited

Loading

codecov-commenter commented Dec 9, 2023 •

edited

Loading