Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible format of compiled wasm modules between wasmvm 1.2.1 and 1.2.2 #426

Closed
webmaster128 opened this issue Apr 18, 2023 · 6 comments

Comments

@webmaster128
Copy link
Member

webmaster128 commented Apr 18, 2023

If you upgrade from wasmvm 1.2.{0,1} to wasmvm 1.2.{2,3} please note that most likely the machine format of the compiled Wasm modules has changed. This leads to crashes like the following when the new version is running

9:04AM INF ABCI Replay Blocks appHeight=14 module=consensus stateHeight=14 storeHeight=15
9:04AM INF Replay last block using real app module=consensus
9:04AM INF minted coins from module account amount=12stake from=mint module=x/bank
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x12f4bc000 pc=0x10455a90c]

runtime stack:
runtime.throw({0x101ab7b89?, 0x12f4b44a1?})
        runtime/panic.go:1047 +0x5d fp=0x7ff7bfefd3a0 sp=0x7ff7bfefd370 pc=0x10003b37d
runtime.sigpanic()
        runtime/signal_unix.go:821 +0x3e9 fp=0x7ff7bfefd400 sp=0x7ff7bfefd3a0 pc=0x100052429

goroutine 1 [syscall]:

To overcome this problem,

  1. Stop the node
  2. Delete the cache folder ~/.noisd/wasm/wasm/cache/ (replace with the location your project uses)
  3. Start the node

You might experience a small slowdown in the beginning since each .wasm code is lazily re-compiled once it is executed.

Thanks a lot to Reece for helping trace that down.

@webmaster128
Copy link
Member Author

The 1.0.1 and 1.1.2 releases are probably not affected because the buildes (i.e. Rust version compiling libwasmvm) did not change for them.

@webmaster128
Copy link
Member Author

webmaster128 commented Apr 18, 2023

I got confirmation from the rkyv chat. It seems to be very likely that the Rust upgrade from 1.65.0 to 1.68.2 changed the (undefined) memory layout of some Rust types, making segfaults during the deserialization of the module the expected behaviour.

@webmaster128
Copy link
Member Author

This will be fixed in CosmWasm 1.3 and beyond, making it extremely unlikely to happen again. The fix contains two layers:

  1. Better cache invalidation. Every time the CPU of a node changes, the modules compiled for the previous CPU might not run anymore. This happens even when going from AND <-> Intel within the x86_64 family because CPUs have different features. CosmWasm 1.3 hashes the full CPU info into the module path (e.g. ~/.noded/wasm/wasm/cache/modules/v5-wasmer17/x86_64-nintendo-fuchsia-gnu-coff-01E9F9FE/ instead of ~/.noded/wasm/wasm/cache/modules/v5-wasmer17/. See Hash target (triple + CPU features) into module cache directory cosmwasm#1664
  2. Checked rkyv deserialization in Wasmer 4. rkyv more or less dumps memory to disk (in a smart way) and loads those dumps back to memory. In the unchecked version used so far (until Wasmer 3) this can load any broken data. There is a way of checking those dumps are valid for the current target structure in memory. As a result you’d get proper Rust errors instead creashes or undefined behaviour in case a module is not in the correct format.
    See https://github.com/wasmerio/wasmer/blob/master/CHANGELOG.md and Update to wasmer 4 cosmwasm#1701

@webmaster128
Copy link
Member Author

webmaster128 commented Jun 1, 2023

This issue affects more migration paths than I originally thought.

wasmvm 1.0.0 1.0.1 1.1.0 1.1.1 1.1.2 1.2.0 1.2.1 1.2.2 1.2.3
1.0.0 not affected 1 ⚠️ ? not affected 2 not affected 2 not affected 2 not affected 2 not affected 2 not affected 2
1.0.1 ⚠️ ? not affected 2 not affected 2 not affected 2 not affected 2 not affected 2 not affected 2
1.1.0 not affected 2 not affected 2 not affected 2 not affected 2 not affected 2 not affected 2
1.1.1 not affected 1 ⚠️ ? ⚠️ ? 🚨 affected 🚨 affected 3
1.1.2 ⚠️ ? ⚠️ ? 🚨 affected 🚨 affected
1.2.0 not affected 1 🚨 affected 🚨 affected
1.2.1 🚨 affected 🚨 affected
1.2.2 not affected 4
1.2.3

Footnotes

  1. Cherry patch, applies just fine 2 3

  2. Contains cache invalidation through MODULE_SERIALIZATION_VERSION 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

  3. This hit the Injective mainnet upgrade

  4. Same Wasmer and builders version

@webmaster128
Copy link
Member Author

wasmvm 1.2.4 invalidates all previous caches to avoid potential issues, no matter from which version you are coming.

@webmaster128
Copy link
Member Author

I consider this done by the 1.2.4 patch release as well as work in 1.3 that will improve the situation even more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant