Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go signals on aarch64 #1469

Open
jrcheli opened this issue May 5, 2023 · 2 comments
Open

Go signals on aarch64 #1469

jrcheli opened this issue May 5, 2023 · 2 comments
Milestone

Comments

@jrcheli
Copy link
Contributor

jrcheli commented May 5, 2023

While working on #1170, we found an issue with signal handling on aarch64. We're creating this ticket to resolve this issue.

From that ticket:

On aarch64 machines, go processes crash intermittently with a "bad g". This has only been seen on aarch64. This was uncovered in go integration tests (go_20) in test cases for "signalHandlerStatic" and "signalHandlerStaticStripped". By adding a go routine to constantly write to the console, these test now fail within a small number of iterations (on this branch). To see them ASAP, I'd recommend commenting out the other test cases and running the go_20 signalHandler tests in a loop to see the "bad g" failure.

We may not have a perfect understanding, but we believe that when a signal happens while we are on the switched stack (for pcre2), go tries to retrieve the "g" from the stack. It's not going to find "g" on the stack when we're currently on our own stack executing pcre2 code, so go crashes. Without the new go routine that writes to the console, the "bad g" happens very infrequently, but importantly "bad g" in the signalHandlerStatic tests has also been observed on branches that do not have the stack pool implementation here. To the best of our knowledge, we could see this any time a signal handler interrupts us while we're running our c code.

If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running.
https://github.com/golang/go/blob/master/src/runtime/cgocall.go
https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s

@jrcheli
Copy link
Contributor Author

jrcheli commented May 5, 2023

I've created a branch bug/1469-go-signals-arm that contains the go routine that writes to the console. This branch would be a good place to continue this investigation.

@michalbiesek
Copy link
Contributor

While working on it,
I can confirm that the test reintroduced in 24e17d5 will pass if we opt out from all the assembly part and just return -1 without calling pcre2_match_wrapper in com.c for the aarch64 variant. One promising way that can be investigated is to manipulate values with mcontext as described in the MATCH_LIMIT option in https://www.pcre.org/current/doc/html/pcre2perform.html#TOC1 to test if a low value allows to passing test and resolve the problem with bad g.
Another approach as John mentioned is described here:

If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running.
https://github.com/golang/go/blob/master/src/runtime/cgocall.go
https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s

@seanvaleo seanvaleo mentioned this issue Jun 22, 2023
16 tasks
@ricksalsa ricksalsa modified the milestones: 1.4.0, 1.5.0 Jul 6, 2023
@ricksalsa ricksalsa modified the milestones: 1.5.0, 1.4.1, 1.4.2 Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants