Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to wasmvm beta7 #774

Merged
merged 3 commits into from
Mar 9, 2022
Merged

Update to wasmvm beta7 #774

merged 3 commits into from
Mar 9, 2022

Conversation

ethanfrey
Copy link
Member

@ethanfrey ethanfrey commented Mar 7, 2022

CI to test on amd64

On Mac M1,

go build ./... and make install pass

go test -count=1 ./x/wasm/client/... passes
go test -count=1 ./x/wasm/keeper fails:

signal 16 received but handler not on signal stack
fatal error: non-Go code set up signal handler without SA_ONSTACK flag

runtime stack:
runtime: unexpected return pc for runtime.sigtramp called from 0x1a96c44e4
stack: frame={sp:0x14000f9c7d0, fp:0x14000f9c8a0} stack=[0x14000f946e0,0x14000f9cae0)
0x0000014000f9c6d0:  0x0000000000000039  0x0000014000f9c748 
0x0000014000f9c6e0:  0x000000010244a6d0 <runtime.sigtrampgo+0x00000000000001a0>  0x0000000000000010 
0x0000014000f9c6f0:  0x0000014000f9c730  0x0000014000f9c748 

Narrowing it down:

$ go test -v -count=1 ./x/wasm/keeper -run TestExecuteWith

=== RUN   TestExecuteWithDeposit
=== RUN   TestExecuteWithDeposit/actor_with_funds
=== RUN   TestExecuteWithDeposit/actor_without_funds
=== RUN   TestExecuteWithDeposit/blocked_address_as_actor
=== RUN   TestExecuteWithDeposit/coin_transfer_with_all_transfers_disabled
=== RUN   TestExecuteWithDeposit/coin_transfer_with_transfer_denom_disabled
=== RUN   TestExecuteWithDeposit/blocked_address_as_beneficiary
--- PASS: TestExecuteWithDeposit (0.19s)
    --- PASS: TestExecuteWithDeposit/actor_with_funds (0.04s)
    --- PASS: TestExecuteWithDeposit/actor_without_funds (0.03s)
    --- PASS: TestExecuteWithDeposit/blocked_address_as_actor (0.03s)
    --- PASS: TestExecuteWithDeposit/coin_transfer_with_all_transfers_disabled (0.03s)
    --- PASS: TestExecuteWithDeposit/coin_transfer_with_transfer_denom_disabled (0.03s)
    --- PASS: TestExecuteWithDeposit/blocked_address_as_beneficiary (0.03s)
=== RUN   TestExecuteWithNonExistingAddress
--- PASS: TestExecuteWithNonExistingAddress (0.00s)
=== RUN   TestExecuteWithPanic
--- PASS: TestExecuteWithPanic (0.03s)
=== RUN   TestExecuteWithCpuLoop

Note that go test -v -count=1 ./x/wasm/keeper -run TestExecuteWithCpu passes about 2 of 3 runs

@ethanfrey ethanfrey requested a review from alpe as a code owner March 7, 2022 20:41
@codecov
Copy link

codecov bot commented Mar 7, 2022

Codecov Report

Merging #774 (feeccb6) into master (f35a13f) will decrease coverage by 0.01%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #774      +/-   ##
==========================================
- Coverage   58.61%   58.59%   -0.02%     
==========================================
  Files          49       50       +1     
  Lines        5835     5838       +3     
==========================================
+ Hits         3420     3421       +1     
- Misses       2165     2166       +1     
- Partials      250      251       +1     
Impacted Files Coverage Δ
x/wasm/keeper/m1compat.go 33.33% <33.33%> (ø)

@ethanfrey
Copy link
Member Author

ethanfrey commented Mar 7, 2022

With feeccb6 I get go test -count=1 ./... to pass on my MacBook. But only by conditionally skipping tons of tests on arch = "arm64".

At least this is a starting point to investigate.

Note that each individual test may pass most of the time. So if you just unskip one and test it, you may need to test it a few times. However, when I remove the SkipIfM1(t) line, this command reliably failed:

go test -p 1 -v -count=1 ./x/wasm/keeper -run TestDispatchSubMsgErrorHandling

Always on === RUN TestDispatchSubMsgErrorHandling/out_of_gas_caught_with_gas_limit

@webmaster128
Copy link
Member

Thank you for giving it a try. I have a quick look but not idea what this issue means.

@@ -672,6 +672,7 @@ func TestExecuteWithNonExistingAddress(t *testing.T) {
}

func TestExecuteWithPanic(t *testing.T) {
SkipIfM1(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These work for me. If I turn this into a noop / remove it, tests pass now.

@maurolacy
Copy link
Contributor

maurolacy commented Mar 8, 2022

The error "signal 16 received but handler not on signal stack
fatal error: non-Go code set up signal handler without SA_ONSTACK flag"

comes from a low-level (non-Go) dependency.
Not sure exactly which one, but this looks fixed now, as tests pass (for me) without the M1 skip.

Here's my C compiler for reference:

$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$ 
$ go version
go version go1.17.7 darwin/amd64
$ 

@webmaster128
Copy link
Member

webmaster128 commented Mar 9, 2022

When I check out (c8a7b83) i.e. before the fix, I get the same error as Ethan.

$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin21.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

$ go version
go version go1.17.8 darwin/arm64

So my target is one minor higher for some reason.

In https://github.com/wasmerio/wasmer/blob/2.2.0/lib/vm/src/trap/traphandlers.rs#L45-L88 you see that Wasmer registers a bunch of signal handlers with SA_ONSTACK. None of them has number 16 as far as I can tell. So the big question is, what is signal 16 and where is this signal handler registered?

@ethanfrey
Copy link
Member Author

Interesting check with the gcc:

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin21.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Again: Target: arm64-apple-darwin21.3.0 like Simon, not Mauro. Not sure if this is something we can configure?

@ethanfrey
Copy link
Member Author

We have three choices:

  1. Release 0.24 without this update
  2. Release 0.24 with this PR as is, and follow up later to make it work fully on m1 (it is better on m1 than before, but we should comment that m1 is still not supported)
  3. Dig in deep and make this work perfectly for 0.24.

I don't know how long #3 will take, but unless someone has time/energy and can finish it today or early tomorrow, I would rather not block on that. I am okay with both (1) and (2) or even (2) with 4 hours of investigation (time boxed 3)

@webmaster128
Copy link
Member

I'm fine with (2), especially since there are people who need Wasmer 2.2.0 fixes independent of ARM.

@ethanfrey
Copy link
Member Author

I'm fine with (2), especially since there are people who need Wasmer 2.2.0 fixes independent of ARM.

Great. Can I get an approval on this? Or a concrete list of change requests.
eg. Shall I add some comment on the SkipIfM1 function?

@maurolacy
Copy link
Contributor

Just seen your updates. I'm OK with releasing with these tests skipped. This looks like some (trivial?) change that needs to be done in one of the C libraries this stuff depends on.

@maurolacy
Copy link
Contributor

maurolacy commented Mar 9, 2022

So my target is one minor higher for some reason.

Reason being, that I'm not a fan of automatic updates and didn't update my machine yet.

In https://github.com/wasmerio/wasmer/blob/2.2.0/lib/vm/src/trap/traphandlers.rs#L45-L88 you see that Wasmer registers a bunch of signal handlers with SA_ONSTACK. None of them has number 16 as far as I can tell. So the big question is, what is signal 16 and where is this signal handler registered?

$ kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL
 5) SIGTRAP	 6) SIGABRT	 7) SIGEMT	 8) SIGFPE
 9) SIGKILL	10) SIGBUS	11) SIGSEGV	12) SIGSYS
13) SIGPIPE	14) SIGALRM	15) SIGTERM	16) SIGURG
17) SIGSTOP	18) SIGTSTP	19) SIGCONT	20) SIGCHLD
21) SIGTTIN	22) SIGTTOU	23) SIGIO	24) SIGXCPU
25) SIGXFSZ	26) SIGVTALRM	27) SIGPROF	28) SIGWINCH
29) SIGINFO	30) SIGUSR1	31) SIGUSR2	
$

Signal 16 is SIGURG. Probably some recent lib / OS changes added a sigurg handler somewhere, and didn't set SA_ONSTACK.

I would just wait and test again in a couple of days / weeks, as this will probably resolve "on its own".

@ethanfrey ethanfrey merged commit 2700e6b into master Mar 9, 2022
@ethanfrey ethanfrey deleted the wasmvm-beta7 branch March 9, 2022 11:28
@webmaster128
Copy link
Member

I got response from a Wasmer dev who confirms this is a SIGURG and says they never came across this in Wasmer. This supports the idea that it is in some OS library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants