Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Rust to >= 1.56.0 #277

Closed
webmaster128 opened this issue Dec 15, 2021 · 18 comments · Fixed by #307
Closed

Upgrade Rust to >= 1.56.0 #277

webmaster128 opened this issue Dec 15, 2021 · 18 comments · Fixed by #307

Comments

@webmaster128
Copy link
Member

webmaster128 commented Dec 15, 2021

When upgrading Rust to 1.56.0 or 1.57.0 in the builder images, we get linkage errors when trying to link the static library (.a) to a Go project (go build and go test) on Alpine Linux. The error message indicates problems regarding position independent code.

cp libwasmvm/target/release/libwasmvm.a api/libwasmvm_muslc.a
make update-bindings
# After we build libwasmvm, we have to copy the generated bindings for Go code to use.
# We cannot use symlinks as those are not reliably resolved by `go get` (https://github.com/CosmWasm/wasmvm/pull/235).
cp libwasmvm/bindings.h api
# try running go tests using this lib with muslc
docker run --rm -u 501:20 -v [...]/wasmvm:/mnt/testrun -w /mnt/testrun cosmwasm/go-ext-builder:0008-alpine go build -tags muslc .
go: downloading github.com/stretchr/testify v1.7.0
go: downloading github.com/tendermint/tm-db v0.6.4
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading github.com/pmezard/go-difflib v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
go: downloading github.com/google/btree v1.0.0
go: downloading github.com/syndtr/goleveldb v1.0.1-0.20200815110645-5c35d600f0ca
go: downloading github.com/golang/snappy v0.0.1
docker run --rm -u 501:20 -v [...]/wasmvm:/mnt/testrun -w /mnt/testrun cosmwasm/go-ext-builder:0008-alpine go test -tags muslc ./api ./types
go: downloading github.com/stretchr/testify v1.7.0
go: downloading github.com/tendermint/tm-db v0.6.4
go: downloading github.com/pmezard/go-difflib v1.0.0
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
go: downloading github.com/google/btree v1.0.0
go: downloading github.com/syndtr/goleveldb v1.0.1-0.20200815110645-5c35d600f0ca
go: downloading github.com/golang/snappy v0.0.1
SIGABRT: abort
PC=0xb1a6fc m=11 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0xb1a6fc
stack: frame={sp:0x7f1120bad3a8, fp:0x0} stack=[0x7f1120bb0cd0,0x7f1120bd08d0)

runtime: unknown pc 0xb1a6fc
stack: frame={sp:0x7f1120bad3a8, fp:0x0} stack=[0x7f1120bb0cd0,0x7f1120bd08d0)


goroutine 531 [syscall]:
runtime.cgocall(0x75df30, 0xc0000b1748)
        /usr/local/go/src/runtime/cgocall.go:156 +0x5c fp=0xc0000b16b0 sp=0xc0000b1678 pc=0x55f7bc
github.com/CosmWasm/wasmvm/api._C2func_execute(0x2b085a0, {0x0, 0xc00001d840, 0x20}, {0x0, 0xc000016d20, 0x6b}, {0x0, 0xc0004f6240, 0x3b}, ...)
        _cgo_gotypes.go:287 +0x85 fp=0xc0000b1748 sp=0xc0000b16b0 pc=0x7519c5
github.com/CosmWasm/wasmvm/api.Execute.func1({0xc0002fee40}, {0x0, 0xc00001d840, 0x6c3289}, {0x0, 0xc000016d20, 0x5690b4}, {0x0, 0xc0004f6240, 0x3b}, ...)
        /mnt/testrun/api/lib.go:209 +0x247 fp=0xc0000b18d8 sp=0xc0000b1748 pc=0x758a47
github.com/CosmWasm/wasmvm/api.Execute({0x24}, {0xc00001d840, 0x0, 0xe7dd0c}, {0xc000016d20, 0x2f4, 0xc000165498}, {0xc0004f6240, 0x3b, 0x40}, ...)
        /mnt/testrun/api/lib.go:209 +0x6fa fp=0xc0000b1bd8 sp=0xc0000b18d8 pc=0x75849a
github.com/CosmWasm/wasmvm/api.TestExecuteCpuLoop(0xc000289ba0)
        /mnt/testrun/api/lib_test.go:427 +0x9c5 fp=0xc0000b1f70 sp=0xc0000b1bd8 pc=0x749445
testing.tRunner(0xc000289ba0, 0xd81990)
        /usr/local/go/src/testing/testing.go:1259 +0x102 fp=0xc0000b1fc0 sp=0xc0000b1f70 pc=0x638602
testing.(*T).Run·dwrap·21()
        /usr/local/go/src/testing/testing.go:1306 +0x2a fp=0xc0000b1fe0 sp=0xc0000b1fc0 pc=0x63930a
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0000b1fe8 sp=0xc0000b1fe0 pc=0x5c34a1
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1306 +0x35a

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000083ba0, {0xd6f7f6, 0x5c5c33}, 0xd81990)
        /usr/local/go/src/testing/testing.go:1307 +0x375
testing.runTests.func1(0xc000083ba0)
        /usr/local/go/src/testing/testing.go:1598 +0x6e
testing.tRunner(0xc000083ba0, 0xc000119d18)
        /usr/local/go/src/testing/testing.go:1259 +0x102
testing.runTests(0xc0000d8100, {0x10ebae0, 0x1c, 0x1c}, {0x5da22d, 0xd6e3b0, 0x1120180})
        /usr/local/go/src/testing/testing.go:1596 +0x43f
testing.(*M).Run(0xc0000d8100)
        /usr/local/go/src/testing/testing.go:1504 +0x51d
main.main()
        _testmain.go:97 +0x14b

rax    0x0
rbx    0x0
rcx    0xb1a6fc
rdx    0x0
rdi    0x2
rsi    0x7f1120bad3b0
rbp    0x7f1120bad3b0
rsp    0x7f1120bad3a8
r8     0x0
r9     0x7f114802cbe7
r10    0x8
r11    0x246
r12    0x7f1120bad900
r13    0x7f1120bad4a0
r14    0x901870
r15    0x7f1120bcf520
rip    0xb1a6fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
FAIL    github.com/CosmWasm/wasmvm/api  1.578s
ok      github.com/CosmWasm/wasmvm/types        0.010s
FAIL
make: *** [release-build-alpine] Error 1

I don't understand the problem. It might be caused by the LLVM 13 upgrade that is part of Rust 1.56. At least I did not find anything else.

Using Rust 1.55.0 for now.

@webmaster128
Copy link
Member Author

Turns out there are two test modes (see go help test): "local directory mode" and "package list mode". We use the later for no reason.

The linkage error in Rust >= 1.56.0 (LLVM 13) only occurs for the package list mode. If we change the test mode, we seem to be fine.

@ethanfrey
Copy link
Member

Weird.
But if it works, go for it.
I don't have the time to dig into that rabbit hole right now

@webmaster128
Copy link
Member Author

What I figured out along the way:

  1. It seems to be a matter of luck that using .a files works at all. The format does not seem to be very stable. People say you need the same compiler for the static library and the final program. This is fine for C++/C++ or Go/Go, but what does "same compiler" even mean for Rust/Go?
  2. Go has its own linker, but can also work with external linkers from GCC and clang. I did not manage to convince myself which one is used by default in this case but switching linkers can potentially help when this or similar issues come up again.
  3. Also note this warning from https://hub.docker.com/_/golang:

golang:-alpine
[..]
This variant is highly experimental, and not officially supported by the Go project (see golang/go#19938 for details).

The main caveat to note is that it does use musl libc instead of glibc and friends, which can lead to unexpected behavior. See this Hacker News comment thread for more discussion of the issues that might arise and some pro/con comparisons of using Alpine-based images.

So overall I would not bet money on the existence of a functional CosmWasm static library and an Alpine build.

@hashedone
Copy link

hashedone commented Feb 2, 2022

Yea, what I am actually thinking - in general the fact that .a file works is very dependent on mangling. If you have same compiler it works. But it may not work for C++/C++ - basically you cannot use mingw compiled libraries in msvc, it wont link (ant it is not matter of luck - it would never work, those two have fundamentally different mangling - or at least historically had). Otherwise it is matter of knowledge of one compiler about how the other mangles. In particular - if you use bindgen to bridge Rust with C++, it basically knows how gcc and msvc mangles. I am not sure how does it work for marrying Rust with Go, I don't have such experience.

But there is one ABI which is very much stable and well defined for static libraries, and it is C. There is a reason, why bindings to tensorflow are used using its C interfaces instead of native C++ - because C ABI is stable and it just works. I am pretty sure that there is notion of exporting Go functions/structures as C does, and use C functions/structures in go. Using such C interfaces as bridge is tedious, but it would probably solve any linkage level problems. I have a little experience with that bridging C++ with Rust, but no experience with Go at all.

@webmaster128
Copy link
Member Author

We use a C interface and all exported functions are non-mangeled like this one:

#[no_mangle]
pub extern "C" fn init_cache(
data_dir: ByteSliceView,
supported_features: ByteSliceView,
cache_size: u32, // in MiB
instance_memory_limit: u32, // in MiB
error_msg: Option<&mut UnmanagedVector>,
) -> *mut cache_t {
. All structures that are passed between Rust and Go use a C representation and are defined in auto-generated C header files for the usage in Go. So this part should be good.

@webmaster128
Copy link
Member Author

webmaster128 commented Feb 9, 2022

I am running into the next problem. When I use the current builders (Rust 1.55.0), I get a static binary for the demo build here. When I upgrade to the new builders (locally), this binary becomes dynamically linked and requires libgcc at runtime on Alpine, which is not installed by default.

Not sure why go build is changing the build mode, but it is.

$ file ./demo
./demo: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, Go BuildID=ZWU5vBdBDN-wmQCjhpsS/XUoekk49jn_6pjGoSpFi/TbFF7M3UvKJNyy2FmsS_/7yrnTkzSqwiBKhvsXrCr, not stripped

and

$ ldd ./demo
        /lib/ld-musl-x86_64.so.1 (0x7ffbf6074000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7ffbf605a000)
        libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7ffbf6074000)

@maurolacy
Copy link
Contributor

maurolacy commented Feb 9, 2022

You can install musl-dev (apk add --no-cache musl-dev) in alpine to fix that I think.

@webmaster128
Copy link
Member Author

webmaster128 commented Feb 9, 2022

I debugged all the internal build and link commands that go build is doing using go build -x. This looks like the exact same thing.

/usr/local/go/pkg/tool/linux_amd64/link is always the linker with -buildmode=exe, -extld=gcc and a bunch of temporary files that are linked together.

I wonder if we can find out where the libgcc_s.so.1 dependency is coming from. Maybe it is in the new .a file but not in the old one. Does anyone have a clue hoe to inspect .a files?

@maurolacy
Copy link
Contributor

maurolacy commented Feb 9, 2022

I debugged all the internal build and link commands that go build is doing using go build -x. This looks like the exact same thing.

/usr/local/go/pkg/tool/linux_amd64/link is always the linker with -buildmode=exe, -extld=gcc and a bunch of temporary files that are linked together.

I wonder if we can find out where the libgcc_s.so.1 dependency is coming from. Maybe it is in the new .a file but not in the old one. Does anyone have a clue hoe to inspect .a files?

For symbols, nm should do. .a files are not more and not less than a collection of objects files. When you link statically, the different parts that are required are copied to your executable.

Libgcc should be around, but .a files / static linking will require a static version of it as well, AFAIK.

@maurolacy
Copy link
Contributor

maurolacy commented Feb 9, 2022

Do you want to generate a static or dynamic version of wasmvm?

@maurolacy
Copy link
Contributor

I recall passing -fPIC to the (GCC) compiler in the past, to solve position independent code issues.

@webmaster128
Copy link
Member Author

I just found this in the GCC link options:

-shared-libgcc
-static-libgcc

    On systems that provide libgcc as a shared library, these options force the use of either the shared or static version, respectively. If no shared version of libgcc was built when the compiler was configured, these options have no effect.

    There are several situations in which an application should use the shared libgcc instead of the static version. The most common of these is when the application wishes to throw and catch exceptions across different shared libraries. In that case, each of the libraries as well as the application itself should use the shared libgcc.

    Therefore, the G++ driver automatically adds -shared-libgcc whenever you build a shared library or a main executable, because C++ programs typically use exceptions, so this is the right thing to do.

    If, instead, you use the GCC driver to create shared libraries, you may find that they are not always linked with the shared libgcc. If GCC finds, at its configuration time, that you have a non-GNU linker or a GNU linker that does not support option --eh-frame-hdr, it links the shared version of libgcc into shared libraries by default. Otherwise, it takes advantage of the linker and optimizes away the linking with the shared version of libgcc, linking with the static version of libgcc by default. This allows exceptions to propagate through such shared libraries, without incurring relocation costs at library load time.

    However, if a library or main executable is supposed to throw or catch exceptions, you must link it using the G++ driver, or using the option -shared-libgcc, such that it is linked with the shared libgcc.

But I think it's super strange if the default value changed by a Rust update. Maybe the gcc version is different because of a new build of the image.

@maurolacy
Copy link
Contributor

Maybe you are using a different (non-GNU) linker now (for whatever reason)?

@webmaster128
Copy link
Member Author

webmaster128 commented Feb 10, 2022

I tested various configurations now with golang:1.17.5-alpine and golang:1.17.6-alpine. The linker is gcc in all cases. This I see in the output of go build -x. The gcc version is always the same (gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027).

I can get libgcc statically linked with go build -ldflags="-extldflags=-static-libgcc". However, libc.musl-x86_64.so.1 remains dynamically linked as soon as I update the Rust version.

Turns out it is very hard to get a fully statically linked binary in Go. No idea why this just works automatically as long as Rust 1.55.0 is used.

@webmaster128
Copy link
Member Author

Maybe we should stop thinking about this as a static build but as a muslc build and there is no real reason why this needs to be static as long as it runs on Alpine machines.

@webmaster128
Copy link
Member Author

Note to myself: https://dubo-dubon-duponey.medium.com/a-beginners-guide-to-cross-compiling-static-cgo-pie-binaries-golang-1-16-792eea92d5aa discusses the topic and contains a bunch of go build option combinations I can try.

@webmaster128
Copy link
Member Author

webmaster128 commented Feb 18, 2022

I got the static build to work with

# See "2. If you really need CGO, but not netcgo" in https://dubo-dubon-duponey.medium.com/a-beginners-guide-to-cross-compiling-static-cgo-pie-binaries-golang-1-16-792eea92d5aa
# See also https://github.com/rust-lang/rust/issues/78919 for why we need -Wl,-z,muldefs
go build -x -ldflags "-linkmode=external -extldflags '-Wl,-z,muldefs -static'" -tags muslc \
  -o demo ./cmd

Looking at this now it is really a surprise how we got static builds before without telling the linker. Unfortunately the node builders will have to juggle with those explicit flags from now on in order to produce static builds.

@webmaster128
Copy link
Member Author

Also interesting for advanced Go binary builds: https://wiki.archlinux.org/title/Go_package_guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants