Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy Wasm to wasmi bytecode translation #844

Merged
merged 77 commits into from
Dec 16, 2023
Merged

Conversation

Robbepop
Copy link
Member

@Robbepop Robbepop commented Dec 9, 2023

Closes #732.
Closes #516.

Benchmarks

Execution

Local benchmarks so far concluded that call intense work load have 10-15% regressed performance when eagerly compiling functions. This is bad since ideally we would only want to suffer performance penalties when lazily compiling functions.
It might be possible to improve this situation with performance improvements in how CodeMap returns compiled functions.
Furthermore call intense workloads are usually pretty rare in real workloads. Compute intense workloads are not significantly affected by changes introduced with lazy compilation.

Translation

After getting lazy translation up and running we were able to gather some promising benchmarks:

Spidermonkey

  • lazy+validated speed-up over eager+validated: 2.1x
  • lazy+unchecked speed-up over eager+unchecked: 8.4x
  • lazy+unchecked speed-up over eager+validated: 10.2x
translate/spidermonkey/checked/eager/default
    time:   [46.189 ms 46.407 ms 46.621 ms]
translate/spidermonkey/checked/eager/fuel
    time:   [47.625 ms 47.845 ms 47.989 ms]
translate/spidermonkey/checked/lazy/default
    time:   [21.786 ms 21.822 ms 21.896 ms]
translate/spidermonkey/unchecked/eager/default
    time:   [37.774 ms 38.120 ms 38.444 ms]
translate/spidermonkey/unchecked/eager/fuel
    time:   [39.838 ms 39.888 ms 39.955 ms]
translate/spidermonkey/unchecked/lazy/default
    time:   [4.5235 ms 4.5328 ms 4.5440 ms]

ERC-20

  • lazy+validated speed-up over eager+validated: 1.8x
  • lazy+unchecked speed-up over eager+unchecked: 2.7x
  • lazy+unchecked speed-up over eager+validated: 3.2x
translate/erc20/checked/eager/default
    time:   [76.379 µs 77.450 µs 79.144 µs]
translate/erc20/checked/eager/fuel
    time:   [81.388 µs 81.930 µs 82.994 µs]
translate/erc20/checked/lazy/default
    time:   [42.184 µs 42.383 µs 42.683 µs]
translate/erc20/unchecked/eager/default
    time:   [65.050 µs 65.228 µs 65.471 µs]
translate/erc20/unchecked/eager/fuel
    time:   [68.156 µs 68.341 µs 68.619 µs]
translate/erc20/unchecked/lazy/default
    time:   [24.212 µs 24.328 µs 24.413 µs]

ERC-1155

  • lazy+validated speed-up over eager+validated: 1.95x
  • lazy+unchecked speed-up over eager+unchecked: 4.0x
  • lazy+unchecked speed-up over eager+validated: 4.6x
translate/erc1155/checked/eager/default
    time:   [157.34 µs 157.81 µs 158.31 µs]
translate/erc1155/checked/eager/fuel
    time:   [169.16 µs 169.41 µs 169.66 µs]
translate/erc1155/checked/lazy/default
    time:   [79.642 µs 80.586 µs 81.850 µs]
translate/erc1155/unchecked/eager/default
    time:   [135.57 µs 136.02 µs 136.49 µs]
translate/erc1155/unchecked/eager/fuel
    time:   [146.67 µs 147.04 µs 147.25 µs]
translate/erc1155/unchecked/lazy/default
    time:   [33.641 µs 34.011 µs 34.278 µs]

Translation Benchmarks: Conclusion

  • We see roughly 2x speed-up with lazy mode compared to eager when validating the input.
  • We see roughly 3-8x speed-up with lazy mode compared to eager without validating the input.
  • We see roughly 3.5-10x speed-up with unchecked lazy mode compared to validated eager mode.

TODOs

  • Add CompilationMode to Config
  • Refactor ModuleBuilder to remove unnecessary lifetime annotations.
  • Add lazy Wasm function translation during Wasm module parsing.
  • Add lazy Wasm function translation when calling a lazily compiled Wasm function the first time.
  • Return wasmi::Error from the Wasmi instruction executor instead of TrapCode.
    • This is required because now call instructions may fail with TranslationError.
  • Fix performance penalties for call intense workloads if possible.
  • Make it technically impossible to race in CodeMap::get.
    • Currently a RwLock is used which is an unfair lock towards writers. However, when lazily compiling a function we need write access. It is possible to fix this by introducing another state that is going to be queried by threads waiting for the function to be compiled.

@paritytech-cicd-pr
Copy link

paritytech-cicd-pr commented Dec 9, 2023

BENCHMARKS

NATIVEWASMTIME
BENCHMARKMASTERPRDIFFMASTERPRDIFFWASMTIME OVERHEAD
execute/
br_table
1.51ms 1.45ms 🟢 -3.87% 1.33ms 1.27ms 🟢 -4.49% 🟢 -13%
execute/
call/host/1
45.50µs 53.98µs 🔴 18.64% 63.18µs 67.95µs 🔴 7.55% 🟢 26%
execute/
call/rec
166.01µs 181.91µs 🔴 9.58% 343.20µs 368.13µs 🔴 7.26% 🔴 102%
execute/
count_until
7.48ms 6.53ms 🟢 -12.69% 7.48ms 7.48ms ⚪ -0.02% 🟢 15%
execute/
divrem
6.27ms 6.22ms ⚪ -0.78% 6.98ms 8.43ms 🔴 20.85% 🟢 36%
execute/
factorial/iter
259.75µs 264.79µs 🔴 1.94% 312.39µs 333.98µs 🔴 6.91% 🟢 26%
execute/
factorial/rec
949.59µs 767.60µs 🟢 -19.16% 1.24ms 1.36ms 🔴 9.73% 🟡 77%
execute/
fibonacci/iter
1.29ms 1.36ms 🔴 5.75% 1.28ms 1.27ms ⚪ -0.63% 🟢 -7%
execute/
fibonacci/rec
6.03ms 6.55ms 🔴 8.70% 12.70ms 13.85ms 🔴 9.03% 🔴 111%
execute/
fibonacci/tail
1.70ms 1.44ms 🟢 -15.24% 3.89ms 3.78ms 🟢 -2.86% 🔴 161%
execute/
fuse
7.72ms 7.19ms 🟢 -6.92% 11.60ms 12.20ms 🔴 5.15% 🟡 70%
execute/
global/bump
1.32ms 1.32ms ⚪ 0.13% 1.55ms 1.62ms 🔴 4.64% 🟢 23%
execute/
global/get_const
733.61µs 687.96µs 🟢 -6.22% 747.12µs 750.70µs ⚪ 0.48% 🟢 9%
execute/
is_even/rec
1.06ms 1.18ms 🔴 10.88% 2.17ms 2.36ms 🔴 8.63% 🟡 100%
execute/
memory/fill_bytes
1.09ms 1.12ms 🔴 2.66% 1.41ms 1.34ms 🟢 -4.94% 🟢 20%
execute/
memory/sum_bytes
1.04ms 1.15ms 🔴 11.06% 1.23ms 1.32ms 🔴 7.14% 🟢 14%
execute/
memory/vec_add
2.96ms 2.95ms ⚪ -0.57% 3.59ms 3.89ms 🔴 8.34% 🟢 32%
execute/
recursive_scan
188.56µs 202.18µs 🔴 7.22% 376.58µs 396.16µs 🔴 5.20% 🟡 96%
execute/
recursive_trap
15.31µs 17.33µs 🔴 13.16% 34.22µs 37.21µs 🔴 8.74% 🔴 115%
execute/
regex_redux
591.39µs 582.21µs 🟢 -1.55% 1.10ms 1.06ms 🟢 -3.49% 🟡 82%
execute/
rev_complement
443.17µs 456.22µs 🔴 2.94% 674.50µs 668.24µs ⚪ -0.93% 🟢 46%
execute/
tiny_keccak
347.50µs 352.09µs 🔴 1.32% 386.87µs 384.19µs ⚪ -0.69% 🟢 9%
execute/
trunc_f2i
613.22µs 613.79µs ⚪ 0.09% 963.86µs 1.01ms 🔴 4.44% 🟡 64%
instantiate/
wasm_kernel
56.13µs 54.05µs 🟢 -3.71% 56.97µs 52.88µs 🟢 -7.16% 🟢 -2%
overhead/
call/typed/0
1.19ms 1.26ms 🔴 5.79% 754.30µs 874.08µs 🔴 15.88% 🟢 -31%
overhead/
call/typed/16
1.61ms 1.67ms 🔴 3.21% 2.09ms 1.98ms 🟢 -5.10% 🟢 19%
overhead/
call/untyped/0
1.62ms 1.57ms 🟢 -2.89% 1.26ms 1.20ms 🟢 -4.63% 🟢 -24%
overhead/
call/untyped/16
2.48ms 2.45ms ⚪ -1.34% 4.05ms 3.77ms 🟢 -7.12% 🟡 54%
translate/
bz2/checked/eager/default
1.36ms 1.32ms 🟢 -3.11% 2.36ms 2.39ms ⚪ 1.33% 🟡 81%
translate/
bz2/checked/eager/fuel
1.47ms 1.43ms 🟢 -3.12% 2.56ms 2.62ms 🔴 2.14% 🟡 83%
translate/
bz2/checked/lazy/default
1.37ms 547.76µs 🟢 -59.98% 2.37ms 981.71µs 🟢 -58.59% 🟡 79%
translate/
bz2/unchecked/eager/default
1.10ms 1.07ms 🟢 -2.96% 1.76ms 1.83ms 🔴 3.81% 🟡 71%
translate/
bz2/unchecked/eager/fuel
1.19ms 1.17ms 🟢 -2.09% 1.95ms 2.04ms 🔴 4.41% 🟡 74%
translate/
bz2/unchecked/lazy/default
1.10ms 35.32µs 🟢 -96.79% 1.77ms 45.53µs 🟢 -97.42% 🟢 29%
translate/
erc1155/checked/eager/default
281.89µs 276.76µs 🟢 -1.82% 475.53µs 469.57µs ⚪ -1.25% 🟡 70%
translate/
erc1155/checked/eager/fuel
302.31µs 296.16µs 🟢 -2.04% 505.29µs 509.78µs ⚪ 0.89% 🟡 72%
translate/
erc1155/checked/lazy/default
282.13µs 128.98µs 🟢 -54.28% 474.11µs 213.61µs 🟢 -54.95% 🟡 66%
translate/
erc1155/unchecked/eager/default
233.32µs 228.82µs 🟢 -1.93% 361.94µs 364.84µs ⚪ 0.80% 🟡 59%
translate/
erc1155/unchecked/eager/fuel
251.51µs 245.76µs 🟢 -2.29% 386.27µs 395.53µs 🔴 2.40% 🟡 61%
translate/
erc1155/unchecked/lazy/default
232.73µs 24.78µs 🟢 -89.35% 359.69µs 31.76µs 🟢 -91.17% 🟢 28%
translate/
erc20/checked/eager/default
135.25µs 135.12µs ⚪ -0.10% 228.16µs 226.71µs ⚪ -0.64% 🟡 68%
translate/
erc20/checked/eager/fuel
143.89µs 142.15µs 🟢 -1.21% 239.47µs 240.59µs ⚪ 0.47% 🟡 69%
translate/
erc20/checked/lazy/default
135.67µs 65.85µs 🟢 -51.46% 229.13µs 108.05µs 🟢 -52.84% 🟡 64%
translate/
erc20/unchecked/eager/default
112.54µs 110.64µs 🟢 -1.69% 174.40µs 174.60µs ⚪ 0.11% 🟡 58%
translate/
erc20/unchecked/eager/fuel
119.70µs 118.15µs 🟢 -1.30% 183.48µs 186.79µs 🔴 1.80% 🟡 58%
translate/
erc20/unchecked/lazy/default
112.42µs 19.09µs 🟢 -83.02% 175.05µs 24.22µs 🟢 -86.17% 🟢 27%
translate/
erc721/checked/eager/default
194.19µs 191.41µs 🟢 -1.43% 328.74µs 327.90µs ⚪ -0.26% 🟡 71%
translate/
erc721/checked/eager/fuel
204.90µs 201.18µs 🟢 -1.81% 344.00µs 346.66µs ⚪ 0.77% 🟡 72%
translate/
erc721/checked/lazy/default
194.47µs 92.37µs 🟢 -52.50% 328.22µs 153.96µs 🟢 -53.09% 🟡 67%
translate/
erc721/unchecked/eager/default
158.23µs 154.31µs 🟢 -2.48% 247.99µs 248.70µs ⚪ 0.29% 🟡 61%
translate/
erc721/unchecked/eager/fuel
166.55µs 164.02µs 🟢 -1.52% 267.10µs 266.02µs ⚪ -0.40% 🟡 62%
translate/
erc721/unchecked/lazy/default
158.25µs 21.75µs 🟢 -86.25% 249.39µs 27.87µs 🟢 -88.83% 🟢 28%
translate/
pulldown_cmark/checked/eager/default
3.62ms 3.58ms ⚪ -1.15% 6.03ms 6.15ms 🔴 1.97% 🟡 72%
translate/
pulldown_cmark/checked/eager/fuel
3.90ms 3.87ms ⚪ -0.69% 6.49ms 6.80ms 🔴 4.76% 🟡 76%
translate/
pulldown_cmark/checked/lazy/default
3.62ms 1.55ms 🟢 -57.25% 6.10ms 2.60ms 🟢 -57.34% 🟡 68%
translate/
pulldown_cmark/unchecked/eager/default
3.03ms 2.99ms 🟢 -1.40% 4.66ms 4.76ms 🔴 2.21% 🟡 59%
translate/
pulldown_cmark/unchecked/eager/fuel
3.29ms 3.26ms ⚪ -1.03% 5.07ms 5.26ms 🔴 3.74% 🟡 62%
translate/
pulldown_cmark/unchecked/lazy/default
3.04ms 243.16µs 🟢 -92.00% 4.67ms 244.95µs 🟢 -94.75% 🟢 1%
translate/
spidermonkey/checked/eager/default
76.07ms 75.69ms ⚪ -0.51% 131.98ms 133.34ms ⚪ 1.03% 🟡 76%
translate/
spidermonkey/checked/eager/fuel
82.22ms 82.27ms ⚪ 0.07% 141.53ms 145.19ms 🔴 2.58% 🟡 76%
translate/
spidermonkey/checked/lazy/default
76.09ms 32.64ms 🟢 -57.11% 131.53ms 56.57ms 🟢 -57.00% 🟡 73%
translate/
spidermonkey/unchecked/eager/default
62.93ms 62.34ms ⚪ -0.93% 100.48ms 102.50ms 🔴 2.01% 🟡 64%
translate/
spidermonkey/unchecked/eager/fuel
68.68ms 68.29ms ⚪ -0.56% 110.13ms 112.83ms 🔴 2.45% 🟡 65%
translate/
spidermonkey/unchecked/lazy/default
62.87ms 3.05ms 🟢 -95.14% 100.33ms 3.71ms 🟢 -96.30% 🟢 22%
translate/
wasm_kernel/checked/eager/default
5.04ms 5.04ms ⚪ 0.07% 8.63ms 8.69ms ⚪ 0.70% 🟡 72%
translate/
wasm_kernel/checked/eager/fuel
5.17ms 5.20ms ⚪ 0.61% 9.05ms 9.20ms 🔴 1.58% 🟡 77%
translate/
wasm_kernel/checked/lazy/default
5.02ms 2.42ms 🟢 -51.76% 8.61ms 4.07ms 🟢 -52.74% 🟡 68%
translate/
wasm_kernel/unchecked/eager/default
4.06ms 4.07ms ⚪ 0.27% 6.55ms 6.67ms 🔴 1.88% 🟡 64%
translate/
wasm_kernel/unchecked/eager/fuel
4.20ms 4.23ms ⚪ 0.70% 6.96ms 7.14ms 🔴 2.49% 🟡 69%
translate/
wasm_kernel/unchecked/lazy/default
4.06ms 394.31µs 🟢 -90.29% 6.58ms 472.61µs 🟢 -92.82% 🟢 20%

Link to pipeline

@codecov-commenter
Copy link

codecov-commenter commented Dec 9, 2023

Codecov Report

Attention: 297 lines in your changes are missing coverage. Please review.

Comparison is base (ac00319) 80.90% compared to head (5cec5c5) 81.39%.

Files Patch % Lines
crates/wasmi/src/engine/code_map.rs 28.97% 76 Missing ⚠️
crates/wasmi/src/engine/mod.rs 58.58% 41 Missing ⚠️
crates/wasmi/src/error.rs 46.03% 34 Missing ⚠️
crates/wasmi/src/engine/translator/mod.rs 70.09% 32 Missing ⚠️
crates/wasmi/src/module/parser.rs 84.17% 22 Missing ⚠️
crates/wasmi/src/engine/translator/error.rs 0.00% 15 Missing ⚠️
crates/wasmi/src/module/builder.rs 88.31% 9 Missing ⚠️
crates/wasmi/src/module/mod.rs 84.74% 9 Missing ⚠️
crates/wasmi/src/engine/bytecode/utils.rs 45.45% 6 Missing ⚠️
crates/wasmi/src/engine/executor/instrs.rs 16.66% 5 Missing ⚠️
... and 20 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
+ Coverage   80.90%   81.39%   +0.48%     
==========================================
  Files         257      256       -1     
  Lines       22665    22772     +107     
==========================================
+ Hits        18338    18535     +197     
+ Misses       4327     4237      -90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This fixes a problem in relink_result that CompiledFunc info is oftentimes results.len() is not available at the time is it required due to uninitialized compiled function entities. Using ModuleHeader instead fixes this issue which should improve codegen in these situations and make codegen non-order dependent.
Required in last commit. (oups)
- This divides CompiledFuncEntity for eager translation and UncompiledFuncEntity for lazy translation.
- This commit does not yet dispatch on UncompiledFuncEntity during execution of call instructions.
- Furthermore this commit does not yet use the new LazyFuncTranslator to actually translate Wasm functions lazily.
This allows us to properly handle failed lazy translations in call instruction executions.
Now wasmi::Error takes over responsibilities of Trap.
This make it possible to remove an unnecessary Box indirection.
This makes fast path faster and fixes some problems with unfair write access.
Currently Wasm benchmark CI runs out of memory for spidermonkey lazy unchecked translation. We want to see if there are memory dependencies between the different translation benchmark runs.
The cycle existed because Engine held ModuleHeader which itself held Engine.
The cycle was broken by introducing EngineWeak and make ModuleHeader hold EngineWeak instead of Engine which is just a fancy wrapper around a Weak pointer to an Engine. Therefore Engine access via ModuleHeader now may fail if the Engine does no longer exist. However, due to the fact that ModuleHeader is only accessed via its Engine, this should technically never occure.
@Robbepop Robbepop marked this pull request as ready for review December 16, 2023 20:35
@Robbepop Robbepop merged commit 1b9aae2 into master Dec 16, 2023
21 checks passed
@Robbepop Robbepop deleted the rf-lazy-compilation branch December 16, 2023 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add lazy Wasm compilation Refactor wasmi error types and make them simpler and more efficient to use
3 participants