-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to rustc 1.70 #9140
Update to rustc 1.70 #9140
Conversation
How can rust upgrade lead to changes in gas costs? @jakmeier https://buildkite.com/nearprotocol/nearcore/builds/28207#01887919-bfe1-418c-814e-ebef03e675be |
Hah, that's an interesting one! I think we are in UB territory. At least this error message is familiar:
It makes me think we are running into the problem again that our contract serialization format is rust version dependent. (see #9025) cc @nagisa maybe you have some immediate thoughts/ideas? |
I won't be able to look at this before Monday, but what I remember is that deserialization issues in rkyv were segfaults, rather than regular errors. Though, I guess once you violate soundness, anything could happen. |
Looks like the new LLVM with the 1.70 has started producing the instructions from the sign extension and other now-mainline WebAssembly proposals by default. This and these other extensions are not supported by the pwasm parser. Unfortunately the relevant release notes are absent. But this appears to be the relevant differential. The proposed way forward is for us to use Note that in particular nearcore is not the only affected party – any contract developer who upgrades to Rust 1.70 will be affected unless they utilize the workaround. It might be worthwhile to communicate this more broadly somehow. cc @akhi3030 @bowenwang1996 One way to do so would be to create and pin a github issue, but there may be other channels too. |
Note that this above can also explain the gas cost changes too – LLVM upgrades are prone to affecting the resulting test WebAssembly contracts’ code sufficiently that they execute fewer or more instructions. To avoid this we should make sure to store our test contracts as WebAssembly if we depend on precise execution outcome when executing those contracts. |
Thanks, I'll look into this. Do we know how things will break for the contract developers? Will they get an error when trying to compile / deploy on localnet / deploy on mainnet / etc. Or will the problem be more subtle? |
So, is this only a limitation of pwasm parser then? I would image native code generation also doesn't know to handle it, if the parsing stage couldn't output these instructions, yet. Is this the case? Or is it possible we only need a pwasm parser update and would be ready to support these new WASM features? |
We would at very least require validation of these extensions, but the code is in place for these instructions in NearVM. |
They will get deployment failures any time they compile their contracts with 1.70+ and the contract is complex enough to convince LLVM to use the new instructions. Given our anemic diagnostics they will definitely have a hard time figuring out why the deployment fails though :) |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Using instructions in #9143, the following patch specifically would get most of the failing tests to pass: diff --git a/runtime/near-test-contracts/build.rs b/runtime/near-test-contracts/build.rs
index 1ec245566..8035897a5 100644
--- a/runtime/near-test-contracts/build.rs
+++ b/runtime/near-test-contracts/build.rs
@@ -53,10 +53,11 @@ fn cargo_build_cmd(target_dir: &Path) -> Command {
res.env_remove("CARGO_BUILD_RUSTFLAGS");
res.env_remove("CARGO_ENCODED_RUSTFLAGS");
- res.env("RUSTFLAGS", "-Dwarnings");
+ res.env("RUSTFLAGS", "-Dwarnings -Ctarget-cpu=mvp");
res.env("CARGO_TARGET_DIR", target_dir);
+ res.env("RUSTC_BOOTSTRAP", "1");
- res.args(["build", "--target=wasm32-unknown-unknown", "--release"]);
+ res.args(["build", "--target=wasm32-unknown-unknown", "--release", "-Zbuild-std=panic_abort,std"]);
res
} That said, I don't believe it is something we necessarily want to do due to this requiring |
So we have a couple of remaining problems.
|
`base_test_contract_rs.wasm` is a WASM used in our tests that is supposed to be compatible with the oldest version. That currently requires rustc < 1.70 or some hacks to use unstable cargo features. (`-Zbuild-std`) The easiest way forward is to check in a WASM built with the correct settings instead or relying on the nearcore toolchain version. This is a pre-requisite to land the 1.70 toolchain upgrade. (#9140)
`rs_contract` can be used by tests that run only on the latest protocol version, as defined by the const PROTOCOL_VERSION. `base_rs_contract` must be used in backwards compatibility tests that rely on specific protocol versions. This is a pre-requisite to land the rustc 1.70 toolchain upgrade. (See also near#9140)
The gas changes were actually due to receipts failing rather than executing, which meant they used less gas. Updating to use a compatible WASM fixes that for free. That means, maybe we only need one more fix, which is to use |
`rs_contract` can be used by tests that run only on the latest protocol version, as defined by the const PROTOCOL_VERSION. `backwards_compatible_rs_contract` must be used in backwards compatibility tests that rely on specific protocol versions. This is a pre-requisite to land the rustc 1.70 toolchain upgrade. (See also #9140)
The remaining issues seem to be of a single cause, which is pointer memory alignment issue.
At a first glance, it seems like it's only an issue for wasmer2, not for near-vm. But I'm not quite sure yet what is going on. @nagisa / @Ekleog-NEAR do you know why this is failing with the rustc upgrade? error message for reference: cargo nextest run --workspace --features nightly,test_features,mock_node -p '*' --exclude 'integration-tests*' -- memory_size_alignment_issue
Starting 1 tests across 91 binaries (1397 skipped)
SIGABRT [ 0.101s] near-vm-runner tests::regression_tests::memory_size_alignment_issue
--- STDOUT: near-vm-runner tests::regression_tests::memory_size_alignment_issue ---
running 1 test
--- STDERR: near-vm-runner tests::regression_tests::memory_size_alignment_issue ---
thread 'tests::regression_tests::memory_size_alignment_issue' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x7fac7002ecac', /home/jakmeier/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-near-2.4.0/src/instance/mod.rs:230:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5 |
Seeing the line on which this fails, it looks like rustc is now enabling ASAN by default. We fixed the same issue on near-vm when the fuzzer noticed it in #9063. We did not fix it for wasmer because of the hassle it would be to make a new wasmer release compared to the fact that we don’t support platforms that require alignment anyway, so the issue is moot without ASAN enabled. Given it looks like this is becoming an actual problem, I’ve backported #9063 to wasmer#156, but AFAICT only @nagisa has the access rights needed to make a new release there. Either way, nothing scary for what we already landed, it just means that a previously-benign issue has now become important, and we’ll need to make one change to wasmer before landing this |
This can’t be true! Enabling ASAN by default has non-trivial implications, not to mention that the failure mode between ASAN (abort) and what’s seen here (a panic) are different. Instead this looks to be rust-lang/rust#98112. |
Wasmer 2.4.1 has been released. diff --git a/Cargo.toml b/Cargo.toml
index fe319b01f..bdbc672bc 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -316,14 +316,14 @@ trybuild = "1.0.11"
turn = "0.6"
validator = "0.12"
wasm-encoder = "0.27.0"
-wasmer-compiler = { package = "wasmer-compiler-near", version = "=2.4.0" }
-wasmer-compiler-singlepass = { package = "wasmer-compiler-singlepass-near", version = "=2.4.0" }
-wasmer-engine = { package = "wasmer-engine-near", version = "=2.4.0" }
-wasmer-engine-universal = { package = "wasmer-engine-universal-near", version = "=2.4.0", features = ["compiler"] }
+wasmer-compiler = { package = "wasmer-compiler-near", version = "=2.4.1" }
+wasmer-compiler-singlepass = { package = "wasmer-compiler-singlepass-near", version = "=2.4.1" }
+wasmer-engine = { package = "wasmer-engine-near", version = "=2.4.1" }
+wasmer-engine-universal = { package = "wasmer-engine-universal-near", version = "=2.4.1", features = ["compiler"] }
wasmer-runtime = { version = "0.18.0", package = "wasmer-runtime-near", features = ["default-backend-singlepass"], default-features = false }
wasmer-runtime-core = { version = "0.18.2", package = "wasmer-runtime-core-near" }
-wasmer-types = { package = "wasmer-types-near", version = "=2.4.0" }
-wasmer-vm = { package = "wasmer-vm-near", version = "=2.4.0" }
+wasmer-types = { package = "wasmer-types-near", version = "=2.4.1" }
+wasmer-vm = { package = "wasmer-vm-near", version = "=2.4.1" }
wasmparser = "0.78" # TODO: unify at least the versions of wasmparser we have in our codebase
wasmprinter = "0.2"
wasm-smith = "0.10" plus a (I would've made a suggestion that you can apply with a single click, but github does not allow commenting on that part of |
Looks better and better. test_wasmer2_artifact_output_stability is the only test failing right now. I don’t recall exactly the purpose of this test and at the moment I’m not sure if the expected can just be updated or not. |
So to be honest I wasn’t really paying attention to what you guys were doing with this change, but from what I’m reading, does support for new instruction needs to be gated on protocol version update? |
We rolled support for it into the
The intent for this test is to detect changes to the cached executable shape, and to remind to increment the version here:
so that this cache is flushed. It is fine to update the expected hashes as long as you update that seed to something like |
This will naturally happen as part of 1.35, so there’s nothing to do here :)
Just to add some context to what @nagisa said, this test has as a goal to validate that we don’t change wasmer while forgetting to clear the compiled contract cache. The upgrade to 2.4.1, needed to make wasmer2 compile-able with rust 1.70, changes codegen and thus requires a clear of the compiled contract cache. Hence the need to update the seed that @nagisa linked, and once that’s done there’s no issue with updating the numbers listed in |
OK, so long as you guys know what you’re doing. ;) My worry was to make sure that contract compiled with 1.70 will be rejected on old blocks even if validator runs node supporting the new instruction. If it all is set up correctly, this PR is ready to land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR is:
- compiling contracts with rustc 1.70 leads to new instructions being used
- compiling nearcore with rustc 1.70 has no impact on new instructions being used
- the protocol upgrade from 1.35 allows nearcore compiled with any rustc to execute the instructions needed by the contracts compiled with rustc 1.70
Hopefully things make more sense now! :)
As for this PR, LGTM from the contract runtime side, but I’ll let people from core speak about in particular the removal of clone()
s in chain/client/src/tests/catching_up.rs
That was linter complaining. |
With 1.70 toolchain, the contracts are by default compiled in a way
which requires updates to wasmer 2.4.1 as well.
I looked also into replacing usage of once_cell crate but sadly
std::sync::OnceLock doesn’t implement wait method.
Co-authored-by: Jakob Meier mail@jakobmeier.ch
Co-authored-by: Simonas Kazlauskas git@kazlauskas.me
Issue: near/create-near-app#2009
Issue: #9143