Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash "SIGABRT: abort"/"signal arrived during cgo execution" during store code on Alpine 3.19 #523

Closed
sampocs opened this issue Mar 11, 2024 · 12 comments
Milestone

Comments

@sampocs
Copy link

sampocs commented Mar 11, 2024

Context

Stride recently added cosmwasm to the chain and when testing we were noticing stochastic panics when uploading contracts. We eventually resolved this by upgrading the dockerfile from alpine 3.16 to 3.17 but I'd imagine there should also be a change to wasmvm to gracefully catch this exception instead of crashing the chain.

Specifics

This occurred on wasmd v0.45.0 and wasmvm v1.5.2 (although, I believe we tried out a few other version combinations while debugging and saw the same issue). This was also reproduced on both mac M1 and linux.

To test, we started up a network locally with docker and uploaded the same contract repeatedly. We noticed that eventually one of the uploads would fail and take down the chain. The exact upload that caused the panic seemed to be stochastic (e.g. during one run, it would be the 3rd upload, then we'd restart the chain from scratch and this time the panic would occur on the 5th upload, etc.)

The error log is shown below (full logs here). We traced it back to this line, bit it's a bit out of my depth to debug beyond that unfortunately.

dockernet-stride1-1  | goroutine 55371 [syscall]:
dockernet-stride1-1  | runtime.cgocall(0x22ba974, 0x400bc96b88)
dockernet-stride1-1  | 	runtime/cgocall.go:157 +0x44 fp=0x400bc96b50 sp=0x400bc96b10 pc=0x44d674
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api._C2func_save_wasm(0xffff4b5124f0, {0x0, 0x400bd02000, 0x2b07f}, 0x0, 0x4003c66dc0)
dockernet-stride1-1  | 	_cgo_gotypes.go:662 +0x40 fp=0x400bc96b80 sp=0x400bc96b50 pc=0x1347960
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api.StoreCode.func1({0x2dc3200?}, {0xa0?, 0x400bd02000?, 0x1d317f0?}, 0x0?)
dockernet-stride1-1  | 	github.com/CosmWasm/wasmvm@v1.5.2/internal/api/lib.go:65 +0x84 fp=0x400bc96c20 sp=0x400bc96b80 pc=0x134a254
dockernet-stride1-1  | github.com/CosmWasm/wasmvm/internal/api.StoreCode({0x1?}, {0x400bd02000?, 0x0?, 0x14?})
dockernet-stride1-1  | 	github.com/CosmWasm/wasmvm@v1.5.2/internal/api/lib.go:65 +0xe4 fp=0x400bc96cf0 sp=0x400bc96c20 pc=0x134a0c4
dockernet-stride1-1  | github.com/CosmWasm/wasmvm.(*VM).StoreCode(0x400bc30000?, {0x400bd02000?, 0x322b77e?, 0xc8000?})
dockernet-stride1-1  | 	github.com/CosmWasm/wasmvm@v1.5.2/lib.go:60 +0x24 fp=0x400bc96d20 sp=0x400bc96cf0 pc=0x1353f54
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.Keeper.create({{0x5264ba8, 0x4001606560}, {0x52ac878, 0x400113bf20}, {0x526b280, 0x400106cc30}, {0x525a2e0, 0x4001607870}, {0x5259c20, 0x4000c50760}, ...}, ...)
dockernet-stride1-1  | 	github.com/CosmWasm/wasmd@v0.45.0/x/wasm/keeper/keeper.go:181 +0x44c fp=0x400bc983d0 sp=0x400bc96d20 pc=0x1d3188c
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.msgServer.StoreCode({0x3048360?}, {0x528a3c8, 0x40095f9740}, 0x4006b86d20)
dockernet-stride1-1  | 	github.com/CosmWasm/wasmd@v0.45.0/x/wasm/keeper/msg_server.go:38 +0x198 fp=0x400bc98bc0 sp=0x400bc983d0 pc=0x1d42608
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/keeper.(*msgServer).StoreCode(0x2a0?, {0x528a3c8?, 0x40095f9740?}, 0x31ece60?)
dockernet-stride1-1  | 	<autogenerated>:1 +0x34 fp=0x400bc98bf0 sp=0x400bc98bc0 pc=0x1d58bf4
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/types._Msg_StoreCode_Handler.func1({0x528a3c8, 0x40095f9740}, {0x3144fa0?, 0x4006b86d20})
dockernet-stride1-1  | 	github.com/CosmWasm/wasmd@v0.45.0/x/wasm/types/tx.pb.go:2209 +0x74 fp=0x400bc98c30 sp=0x400bc98bf0 pc=0x15260b4
dockernet-stride1-1  | github.com/cosmos/cosmos-sdk/baseapp.(*MsgServiceRouter).RegisterService.func2.1({0x5289e18, 0x40048302c0}, {0x400bc98cd8?, 0x110dc4c?}, 0x2a0?, 0x40038a6390)
dockernet-stride1-1  | 	github.com/cosmos/cosmos-sdk@v0.47.5/baseapp/msg_service_router.go:118 +0x98 fp=0x400bc98c80 sp=0x400bc98c30 pc=0x110dec8
dockernet-stride1-1  | github.com/CosmWasm/wasmd/x/wasm/types._Msg_StoreCode_Handler({0x312e4c0?, 0x4000c285f8}, {0x5289e18, 0x40048302c0}, 0x4c6dd58, 0x4003c66d00)
dockernet-stride1-1  | 	github.com/CosmWasm/wasmd@v0.45.0/x/wasm/types/tx.pb.go:2211 +0x12c fp=0x400bc98ce0 sp=0x400bc98c80 pc=0x1525f8c
dockernet-stride1-1  | github.com/cosmos/cosmos-sdk/baseapp.(*MsgServiceRouter).RegisterService.func2({{0x528a3c8, 0x4006746660}, {0x52a0ec0, 0x40032fd840}, {{0xb, 0x0}, {0x4008e135ba, 0x6}, 0x3b3, {0x117a76ed, ...}, ...}, ...}, ...)

Next Steps

I'll defer to you all on how best to handle this. I'm happy to put together a branch with instructions to recreate if it'd be helpful - just let me know!

@webmaster128
Copy link
Member

Thank you for the report. This is an issue we got reported elsewhere already too. The common error is above your log snippet:

dockernet-stride1-1  | SIGABRT: abort
dockernet-stride1-1  | PC=0x2bd8ccc m=14 sigcode=18446744073709551610
dockernet-stride1-1  | signal arrived during cgo execution

What we know so far is that it is some sort of problem with more recent Alpine versions. E.g. one reporter said

Actually. Apline 3.17 and building with go1.20 instead of 1.21 also solves the issue.

We never saw this issue on GNU linux.

@webmaster128
Copy link
Member

I was able to reproduce the issue locally using just wasmd. Turns out it depends on the system used to build the chain, not the one running the chain. The problem starts with Alpine 3.19:

Bildschirmfoto 2024-03-12 um 18 22 02

@gorgos
Copy link

gorgos commented Mar 12, 2024

The root cause is very likely inside Wasmer related to the muslc logic, I've left a comment here. And relevant Wasmer code is here.

At Injective we resolved it by using Debian image (which uses glibc) instead of Alpine Linux. And Babylon chain had the same issue and resolved it the same way: babylonchain/babylon#427.

@webmaster128
Copy link
Member

@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.

@webmaster128
Copy link
Member

@sampocs Do you have more info about which alpine 3.16 setup you used initially? Alpine 3.16 being affected is irritating. For us the problem is rather new (late 2023, after Alpine 3.19 release), and we did not hear about it from older Alpine versions. Also I don't see anything open in Wasmer, so it is likely most Alpine versions are not affected.

Found it. The version before this commit Stride-Labs/stride@6a8f0ce used golang:${GO_VERSION}-alpine with GO_VERSION="1.21" but golang:1.21-alpine is the same as golang:1.21-alpine3.19. I.e. you had build image 3.19 and runtime image 3.16. According to my research above it turns out that the problem is in the build image, not the runtime image.

@webmaster128 webmaster128 changed the title Catch panic during store code on alpine 3.16 Crash "SIGABRT: abort"/"signal arrived during cgo execution" during store code on Alpine 3.19 Mar 13, 2024
@webmaster128
Copy link
Member

Okay, it seems like Wasmtime had the same issue and fixed it. Essentially the deal is

Previously this decision was static. FreeBSD and Linux glibc would assume libgcc and everything else was assumed to be libunwind. It's possible to use libgcc on other platforms, however, such as with musl.

Wasmer ticket here now: wasmerio/wasmer#4488

@sampocs
Copy link
Author

sampocs commented Mar 14, 2024

@webmaster128 sorry for late reply, but yeah you're right we were building with 3.19 and running with 3.16!

Glad to hear you tracked down the issue though!

dadamu added a commit to desmos-labs/desmos that referenced this issue Apr 2, 2024
## Description

Closes: #XXXX

This PR drops alpine of building environment to 3.18 to avoid from the
issues with wasmer.

References:
CosmWasm/wasmvm#523
wasmerio/wasmer#4488

<!-- Add a description of the changes that this PR introduces and the
files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is
not applicable and
please add links to any relevant follow up issues.*

I have...

- [ ] included the correct [type
prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json)
in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [ ] targeted the correct branch (see [PR
Targeting](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building
modules](https://docs.cosmos.network/v0.44/building-modules/intro.html)
- [ ] included the necessary unit and integration
[tests](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [ ] included comments for [documenting Go
code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable
and please add
your handle next to the items reviewed if you only reviewed selected
items.*

I have...

- [ ] confirmed the correct [type
prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json)
in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **Chores**
- Updated the base image for the Desmos Builder to
`golang:1.20-alpine3.18` for improved stability and performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@webmaster128
Copy link
Member

Does anyone have experience with this problem and Go 1.22?

In my tests I see the same behaviour as in Go 1.21

Bildschirmfoto 2024-05-31 um 00 49 38

@webmaster128
Copy link
Member

webmaster128 commented Jun 27, 2024

This is now fixed in Wasmer but not yet included in a Wasmer release. So we'll likely close this as part of CosmWasm 2.2

@webmaster128
Copy link
Member

@webmaster128
Copy link
Member

Done in 2.1

@webmaster128
Copy link
Member

If anyone needs to support Go 1.23+ and wasmvm < 2.1 you can create a custom Alpine 3.18 image and install the Go version you need as follows: #576 (comment).

However, the recommended solution is to upgrade an up-to-date version of wasmvm and wasmd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants