Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade rules_go to v0.47.0 #6436

Merged
merged 2 commits into from
Apr 30, 2024
Merged

Upgrade rules_go to v0.47.0 #6436

merged 2 commits into from
Apr 30, 2024

Conversation

sluongng
Copy link
Contributor

No description provided.

@sluongng sluongng requested a review from fmeum April 24, 2024 13:36
MODULE.bazel Show resolved Hide resolved
// require.Error(t, res.Error)
// assert.Contains(t, res.Error.Error(), "signal: terminated")
// assert.Equal(t, -1, res.ExitCode)
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bduffany FYI.

I thought this test only killed the sh shell, but for some reason sigterm is being ignored here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can restore the original behavior with signal.Reset(os.SIGTERM), but its slightly suspicious that this is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try bisecting rules_go tmr to verify the root cause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bisect confirmed the culprit PR.

signal.Reset did not fix the test, but adding our own signal.Notify handler worked.

@@ -140,7 +140,7 @@ func TestExecStreamed_Crash(t *testing.T) {
Arguments: []string{"bash", "-c", `
echo foo-stdout >&1
echo bar-stderr >&2
kill 0 -KILL
Copy link
Member

@bduffany bduffany Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this change is needed? I'd be surprised if upgrading rules_go broke this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://buildbuddy.buildbuddy.io/invocation/d1ddf6d1-cf88-4de1-89bd-bf39c97ca43d here is an example of this failure.

From https://man7.org/linux/man-pages/man1/kill.1.html, we should use kill -KILL -- 0 to be sure.

It's unclear to me why bazel-contrib/rules_go#3920 triggered this though.

Comment on lines 125 to 124
// Since https://github.com/bazelbuild/rules_go/pull/3920
// We need to make sure that our go test binary properly capture and handle SIGTERM
_, stop := signal.NotifyContext(ctx, syscall.SIGTERM)

res := runSh(ctx, "kill -TERM $$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SIGTERM signal here should only be getting sent to the sh process running the kill -TERM command so I'm a bit confused why this change is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I am not sure either... I tried to dump out the process tree plus the process group id but I could not tell for certain how the sigterm could propagate from the child process, the /bin/sh shell, to the parent process (the go test binary). Or if there is something else (i.e. Bazel) sending the go binary sigkterm.

A simpler solution here for me is to patch rules_go with the old signal.Notify behavior. Which, as I discussed with the rules_go PR author, seems like it would work.

Raised the rules_go PR for this and in the meantime, patch our own rules_go with that fix.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the error that was happening without the _, stop := signal.NotifyContext(ctx, syscall.SIGTERM) workaround? (before applying the revert patch to rules_go)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no the error was pretty much caused by the change in rules_go. I have bisected rules_go to that exact PR.

I think that PR made it so that the test itself could not use sigterm.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, what was the test failure you were seeing? I understand it was caused by rules_go but trying to understand why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood your question 🤦

Here is an example failure with the revert of patching rules_go https://buildbuddy.buildbuddy.io/invocation/f016c53f-2073-476b-a338-939a56ef720d?target=%2F%2Fenterprise%2Fserver%2Fremote_execution%2Fcommandutil%3Acommandutil_test&targetStatus=6

Here is with a small patch to get a bit more verbose output

--- a/enterprise/server/remote_execution/commandutil/commandutil_unix_test.go
+++ b/enterprise/server/remote_execution/commandutil/commandutil_unix_test.go
@@ -123,6 +123,7 @@ func TestRun_Killed_ErrorResult(t *testing.T) {
        {
                res := runSh(ctx, "kill -TERM $$")

+               t.Logf("ExitCode: %d\nStdout: %s \n---\nStderr: %s\n", res.ExitCode, string(res.Stdout), string(res.Stderr))
                require.Error(t, res.Error)
                assert.Contains(t, res.Error.Error(), "signal: terminated")
                assert.Equal(t, -1, res.ExitCode)

https://buildbuddy.buildbuddy.io/invocation/09926add-3892-4b97-8d24-69955437fb0b?target=%2F%2Fenterprise%2Fserver%2Fremote_execution%2Fcommandutil%3Acommandutil_test&targetStatus=6

We were expecting an error, but the command simply executed successfully.

Copy link
Member

@bduffany bduffany Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, TIL about sigaction. IIUC, the go runtime is ignoring SIGTERM by ultimately doing a syscall like sigaction(SIGTERM, &sigaction{sa_handler: SIG_IGN}, NULL). The installed action handlers are then inherited by child processes, which means that child processes will ignore signals too.

So it does seem like rules_go should not be doing signal.Ignore as that will likely break more people than just us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sluongng sluongng merged commit ee29cfb into master Apr 30, 2024
19 checks passed
@sluongng sluongng deleted the sluongng/rules-go-47.0 branch April 30, 2024 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants