Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: incorrect compilation of specific functions in wasm #68156

Open
kkoreilly opened this issue Jun 24, 2024 · 17 comments
Open

cmd/compile: incorrect compilation of specific functions in wasm #68156

kkoreilly opened this issue Jun 24, 2024 · 17 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kkoreilly
Copy link

kkoreilly commented Jun 24, 2024

UPDATE: I found a much simpler way to reproduce this issue; see #68156 (comment)

Go version

go version go1.22.4 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/kaioreilly/Library/Caches/go-build'
GOENV='/Users/kaioreilly/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/kaioreilly/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/kaioreilly/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/kaioreilly/cogent/core/go.mod'
GOWORK='/Users/kaioreilly/cogent/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/pw/lntpc2cn37v41ct1t0t9t9880000gn/T/go-build988797772=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Apologies for the convoluted steps for reproducing. I tried for several hours to get a more reproducible example, but I could not. Somehow there is some deleterious interaction between these various components, and removing any one of them stops the issue. This may seem like an issue to do with one of these third party libraries, but as you will see, this behavior seems to almost certainly be an issue with the compiler or Chromium itself.

I made a go.work workspace containing local clones of the main/master branches of Cogent Core (https://github.com/cogentcore/core) and goldmark (https://github.com/yuin/goldmark). Then, I went to Cogent Core (cd core) and installed the Cogent Core tool (go install ./cmd/core). Then, I went to the basic example in Cogent Core (cd examples/basic) and ran the program for web using core run web. core run web is a command that runs go build using GOOS=js GOARCH=wasm under the hood. Then, I went to port 8080 of my computer's local IP address (http://192.168.1.x:8080; you can run ifconfig to get yours) in Google Chrome on an Android phone (mine is a Pixel 7 Pro running Android 14 with Chrome 126.0.6478.72). I observed that everything worked as expected.

Then, I changed the following line in basic.go in examples/basic from

import "cogentcore.org/core/core"

to

import (
	"cogentcore.org/core/core"
	_ "github.com/yuin/goldmark"
)

Then, I repeated the core run web step from above, opened a new tab on the Android phone, and went to the same URL again. After waiting for some amount of time (no more than a minute), I observed the issue described below.

What did you see happen?

After following all of the steps above, the browser tab reliably turns from correctly rendering the app to displaying an Aw, Snap! error screen (as in https://support.google.com/chrome/answer/95669). Nothing is printed to the console and no further information about a specific error is given.

What did you expect to see?

I expected to see the browser tab continue to correctly render the app.

After several hours of debugging, I have determined that the cause of the issue can be narrowed down to a single function declared in goldmark: util.ReadWhile. When I change the body of this function to just be panic("ReadWhile"), not only does the issue described above cease to happen, the app does not panic. This means that we are actually not even calling this function in the first place, yet somehow its contents are causing bizarre app crashes. I tried various different variations of this function: println("ReadWhile") with a return 0, false afterward results in no crashing and no print statement, whereas the same but with fmt.Println("ReadWhile") results in crashing just like the original body of the function. Several variations on the argument and loop structure of the function have not produced any changes in the crashing behavior; you basically have to remove the entire body of the function for it to stop crashing. Changing the name to SomethingElse also does not stop the crashing. I also tried moving this ReadWhile function elsewhere, but somehow I could not replicate the crash without it being in the same place in goldmark.

The fact that the innocuous contents of a function that is never called lead the program to randomly crash without any further details suggests that this is not a coding error on the part of a third party library but rather some issue with the way that the compiler is building wasm files, or the way that Chromium on Android is handling them. Cogent Core has its own custom wasm_exec.js file, but I verified that the issue still occurs with the standard version contained in misc.

I have only been able to reproduce this issue in Chrome on Android and other Chromium browsers on Android such as Kiwi Browser. It does not occur in Firefox on Android, Safari on iOS, or Chrome on macOS, Ubuntu, or Windows.

@gabyhelp
Copy link

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@joedian joedian added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Jun 25, 2024
@matloob
Copy link
Contributor

matloob commented Jul 2, 2024

cc @aclements

@kkoreilly
Copy link
Author

kkoreilly commented Jul 15, 2024

Both wasm2wat app.wasm -o app.wat (from https://github.com/WebAssembly/wabt) and wasm-opt app.wasm -o opt.wasm -O --enable-bulk-memory (from https://github.com/WebAssembly/binaryen) fail to correctly process app.wasm if and only if it is the version including util.ReadWhile that fails on Chrome on Android. This shows that the issue here is with the way Go is compiling util.ReadWhile for wasm, not the way that Chrome is handling it, given that two other widely used programs fail with the same crashing wasm file. This also removes the possibility that wasm_exec.js or any of the other JS files are involved in causing the crash, since the other programs fail with the wasm file completely independent of those other files.

wasm2wat fails immediately with:

~/cogent/core/examples/basic/bin/web/ > wasm2wat app.wasm -o app.wat
error: label stack exceeds max nesting depth
102a212: error: OnBlockExpr callback failed

wasm-opt does not fail with an overt error message; it just continues to run forever while consuming increasing amounts of RAM (40+ GB) and creating an opt.wasm file with a ballooning size.

This also creates a much easier, faster, and more reliable way to check whether an app will fail, which removes the need for an Android device and the core tool to reproduce this issue. If this command fails, the program will crash; otherwise, it will work:

GOOS=js GOARCH=wasm go build -o app.wasm && wasm2wat app.wasm -o app.wat

@kkoreilly
Copy link
Author

kkoreilly commented Jul 15, 2024

I have reduced the code necessary to reproduce the issue to this:

package main

import (
	"github.com/jinzhu/copier"
	_ "github.com/yuin/goldmark"
)

func main() {
	copier.Copy(&struct{}{}, struct{}{})
}

You just have to run this command with that code to reproduce the issue:

GOOS=js GOARCH=wasm go build -o app.wasm && wasm2wat app.wasm -o app.wat

@kkoreilly
Copy link
Author

Inside of copier, the cause of the crashing is the four reflect.Value.MethodByName calls in copier.copier. If you comment all four of those out, the crashing stops. As a reminder from above, if you comment out the contents of util.ReadWhile in goldmark, it also stops crashing. This seems to indicate that the cause of the issue is some bizarre interaction between these two functions that results in incorrectly compiled wasm.

Also, just to be clear, we are not actually running the wasm file in the latest reproduction of this issue, just trying to process it, so this has nothing to do with these functions causing issues at runtime. This is only about the compiled wasm.

@kkoreilly kkoreilly changed the title cmd/go: incorrect compilation of a specific function results in wasm crashes in Chrome on Android cmd/go: incorrect compilation of specific functions in wasm Jul 15, 2024
@kkoreilly
Copy link
Author

kkoreilly commented Jul 15, 2024

Actually, you can reproduce the issue even without copier using this code:

package main

import (
	"bytes"

	"github.com/yuin/goldmark"
)

func main() {
	goldmark.New().Convert([]byte{}, &bytes.Buffer{})
}

And the same command as above:

GOOS=js GOARCH=wasm go build -o app.wasm && wasm2wat app.wasm -o app.wat

The only way to get this code to work is to comment out util.ReadWhile in goldmark as described above.

@kkoreilly
Copy link
Author

Also note that in other situations with more complicated code, even commenting out the contents of util.ReadWhile does not solve the issue, meaning that there is no viable workaround for this issue.

@matloob
Copy link
Contributor

matloob commented Jul 15, 2024

cc @golang/wasm

@kkoreilly
Copy link
Author

kkoreilly commented Jul 15, 2024

By copying code from goldmark and experimenting, I have gotten this down to a minimal standalone reproducible example, which I have attached here (it is a .txt file because GitHub does not accept .go files). The issue appears to be a large map that goldmark defines, html5entities. I have gotten it to a point where the program fails, but removing any element of the map or the Characters field of any element of the map results in the program working. This implies that the issue has to do with excessively large objects not being compiled correctly in wasm.

Here is a simple list of the steps required to reproduce this issue now:

  1. Download the attached file and place it in a new Go module
  2. Download wasm2wat from https://github.com/WebAssembly/wabt
  3. Run this command:
GOOS=js GOARCH=wasm go build -o app.wasm && wasm2wat app.wasm -o app.wat

You should observe the following error indicating that the compiled wasm is invalid:

error: label stack exceeds max nesting depth
00ea012: error: OnBlockExpr callback failed

Note that this same issue results not only in this error but also wasm-opt failing and the program crashing in Chrome on Android, as stated in previous comments. You can further observe this by going to the goldmark playground in Chrome on an Android device, and you will see that it crashes with an Aw, Snap! message after some amount of time, as also discussed previously.

Also note that this is not the only way to reproduce this issue, and there are other combinations of functions that result in the same issue as discussed above.

@matloob
Copy link
Contributor

matloob commented Jul 15, 2024

@kkoreilly If I understand correctly this is unrelated to to the go command itself, right? It sounds like this is perhaps an issue in the compiler, or elsewhere in the toolchain? I'm asking so we can get the right people to look at it.

@kkoreilly
Copy link
Author

Yes, it is an issue with the compiler for wasm, not the go command itself.

@matloob matloob changed the title cmd/go: incorrect compilation of specific functions in wasm cmd/compile: incorrect compilation of specific functions in wasm Jul 15, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 15, 2024
@matloob matloob removed the GoCommand cmd/go label Jul 15, 2024
@rcoreilly
Copy link

A further note overall: the wasm2wat command is the most sensitive indicator of the issue, and even when it is just under threshold of crashing, it produces extremely large .wat files and takes a long time to run. This is consistent with some kind of "overloading" problem that is cumulative, and might explain the otherwise strange interactions with the copier package, and the fact that even when we eliminated the offending html5entities in another test case, it still failed in our larger app context.

Chromium on the android platform appears to be the most sensitive to this overloading problem, as indicated by this report: https://issues.chromium.org/issues/40770034 and others like it. Interestingly, this issue contains a reference to this other Go project: https://github.com/bvisness/golang-wasm-turbofan-crash -- the author there notes:

Turns out the Go compiler generates very branchy code for map initialization in some cases. The problem went away when we changed our initialization approach to produce fewer branches.

So this is consistent with what we are observing. I could not find a prior Go issues report for this case.

Finally, as the above issue notes, the problem arises when this code is being optimized by the TurboFan stage of chromium wasm loading: https://v8.dev/docs/wasm-compilation-pipeline which explains the otherwise puzzling delay in seeing the crash on an android device. We were able to modify the chrome flags to disable the baseline loader and only use TurboFan, which resulted in a deterministic crash after the optimization process presumably ran out of memory.

Given how sensitive wasm-opt and wasm2wat are to this issue, and the clearly large amount of RAM it is consuming (wasm-opt went over 48GB on my macbook pro with 64GB), it is likely that fixing this branchy code generation will benefit all platforms in terms of wasm loading and reducing the load on the TurboFan step so the optimized code is available sooner. Therefore, it would seem that this should be a very high priority issue overall!

@neelance
Copy link
Member

@rcoreilly One reason for this is WebAssembly/design#796 (comment).

@rcoreilly
Copy link

@neelance so it looks like maybe there are some possible solutions but we shouldn't be holding our breath? :)

@rcoreilly
Copy link

Also, in the meantime, is there any documentation for how to avoid some of the worst-case scenarios, e.g., for initializing maps or other problematic cases?

@rcoreilly
Copy link

rcoreilly commented Jul 15, 2024

Similar underlying issues for reference: #65440 #42979 #43033

@mknyszek
Copy link
Contributor

CC @golang/compiler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

9 participants