Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: (*Process).Wait sometimes hangs on netbsd #50138

Closed
bcmills opened this issue Dec 13, 2021 · 55 comments
Closed

os: (*Process).Wait sometimes hangs on netbsd #50138

bcmills opened this issue Dec 13, 2021 · 55 comments
Assignees
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 13, 2021

greplogs --dashboard -md -l -e 'panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait(?:.*\n)+FAIL\s+cmd/link'

2021-12-12T06:14:07-9c6e8f6/netbsd-386-9_0-n2

goroutine 29 [syscall, 2 minutes]:
syscall.Syscall6(0x21a8, 0x892ee8c, 0x0, 0x8a2c1e0, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/syscall/asm_unix_386.s:43 +0x5 fp=0x892ee38 sp=0x892ee34 pc=0x80b9605
syscall.wait4(0x21a8, 0x892ee8c, 0x0, 0x8a2c1e0)
	/tmp/workdir/go/src/syscall/zsyscall_netbsd_386.go:34 +0x5b fp=0x892ee70 sp=0x892ee38 pc=0x80b737b
syscall.Wait4(0x21a8, 0x892eeb0, 0x0, 0x8a2c1e0)
	/tmp/workdir/go/src/syscall/syscall_bsd.go:144 +0x3b fp=0x892ee94 sp=0x892ee70 pc=0x80b558b
os.(*Process).wait(0x8a04660)
	/tmp/workdir/go/src/os/exec_unix.go:43 +0x82 fp=0x892eec8 sp=0x892ee94 pc=0x80de982
os.(*Process).Wait(...)
	/tmp/workdir/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:507 +0x4d fp=0x892ef0c sp=0x892eec8 pc=0x816b07d
os/exec.(*Cmd).Run(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:341 +0x43 fp=0x892ef1c sp=0x892ef0c pc=0x816a463
os/exec.(*Cmd).CombinedOutput(0x8a18fd0)
	/tmp/workdir/go/src/os/exec/exec.go:567 +0x89 fp=0x892ef30 sp=0x892ef1c pc=0x816b549
cmd/link.TestContentAddressableSymbols(0x89290e0)
	/tmp/workdir/go/src/cmd/link/link_test.go:879 +0x136 fp=0x892ef9c sp=0x892ef30 pc=0x83824b6
testing.tRunner(0x89290e0, 0x842c054)
	/tmp/workdir/go/src/testing/testing.go:1410 +0x10d fp=0x892efe4 sp=0x892ef9c pc=0x813d19d
testing.(*T).Run.func1()
	/tmp/workdir/go/src/testing/testing.go:1457 +0x28 fp=0x892eff0 sp=0x892efe4 pc=0x813df78
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_386.s:1311 +0x1 fp=0x892eff4 sp=0x892eff0 pc=0x80ab211
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:1457 +0x36e

2021-10-29T18:34:24-903f313/netbsd-amd64-9_0
2021-10-01T15:59:38-e5ad363/netbsd-arm-bsiegert

goroutine 28 [syscall, 27 minutes]:
syscall.Syscall6(0x1c1, 0xd1f, 0xa09db4, 0x0, 0x9b27e0, 0x0, 0x0)
	/var/gobuilder/buildlet/go/src/syscall/asm_netbsd_arm.s:39 +0x8 fp=0xa09d5c sp=0xa09d58 pc=0x8d3f8
syscall.wait4(0xd1f, 0xa09db4, 0x0, 0x9b27e0)
	/var/gobuilder/buildlet/go/src/syscall/zsyscall_netbsd_arm.go:35 +0x54 fp=0xa09d94 sp=0xa09d5c pc=0x8a694
syscall.Wait4(0xd1f, 0xa09dd8, 0x0, 0x9b27e0)
	/var/gobuilder/buildlet/go/src/syscall/syscall_bsd.go:145 +0x3c fp=0xa09db8 sp=0xa09d94 pc=0x88c58
os.(*Process).wait(0x983290)
	/var/gobuilder/buildlet/go/src/os/exec_unix.go:44 +0x100 fp=0xa09df0 sp=0xa09db8 pc=0xb4f1c
os.(*Process).Wait(...)
	/var/gobuilder/buildlet/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:507 +0x50 fp=0xa09e2c sp=0xa09df0 pc=0x1482d0
os/exec.(*Cmd).Run(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:341 +0x48 fp=0xa09e3c sp=0xa09e2c pc=0x147810
os/exec.(*Cmd).CombinedOutput(0x98cc60)
	/var/gobuilder/buildlet/go/src/os/exec/exec.go:567 +0x98 fp=0xa09e50 sp=0xa09e3c pc=0x14882c
cmd/link.TestIssue33979.func2({0x983200, 0x21}, {0x9ea0a0, 0x9, 0x9})
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:199 +0x90 fp=0xa09ea8 sp=0xa09e50 pc=0x368e14
cmd/link.TestIssue33979.func3({0x9ea0a0, 0x9, 0x9})
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:206 +0x60 fp=0xa09ecc sp=0xa09ea8 pc=0x368d5c
cmd/link.TestIssue33979(0x8834a0)
	/var/gobuilder/buildlet/go/src/cmd/link/link_test.go:239 +0x3bc fp=0xa09f98 sp=0xa09ecc pc=0x368790
testing.tRunner(0x8834a0, 0x41871c)
	/var/gobuilder/buildlet/go/src/testing/testing.go:1389 +0x118 fp=0xa09fe0 sp=0xa09f98 pc=0x1195d4
testing.(*T).Run.func1()
	/var/gobuilder/buildlet/go/src/testing/testing.go:1436 +0x30 fp=0xa09fec sp=0xa09fe0 pc=0x11a448
runtime.goexit()
	/var/gobuilder/buildlet/go/src/runtime/asm_arm.s:824 +0x4 fp=0xa09fec sp=0xa09fec pc=0x7d028
created by testing.(*T).Run
	/var/gobuilder/buildlet/go/src/testing/testing.go:1436 +0x3a0

2021-09-21T20:39:31-48cf96c/netbsd-arm-bsiegert
2021-09-14T14:27:57-181e8cd/netbsd-arm-bsiegert
2021-04-29T15:47:16-12eaefe/freebsd-amd64-11_4
2021-04-28T13:49:52-4fe324d/netbsd-386-9_0
2021-03-05T02:30:31-b62da08/netbsd-386-9_0
2021-02-19T00:40:05-95a44d2/netbsd-arm64-bsiegert
2019-09-04T21:52:18-aae0b5b/linux-ppc64le-power9osu

#44801 may be closely related.

Note that many of this failures are on architectures not believed to be affected by #49209.

@bsiegert, @coypoop: any ideas?

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD labels Dec 13, 2021
@bcmills bcmills added this to the Backlog milestone Dec 13, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Dec 13, 2021

Could also be related to #48789.

@bcmills
Copy link
Contributor Author

bcmills commented Dec 13, 2021

The stuck calls appear to be running go run, go tool link, etc. Since these are Go binaries, probably a good starting point would be to send the stuck processes SIGQUIT to try to get goroutine dumps, and then to send them SIGKILL if they still don't respond.

That would at least help us to determine whether the hang is in the subprocess or the parent process.

(I think @aclements and @mknyszek were working on retrofitting that logic to various tests?)

@cherrymui
Copy link
Member

The arm and arm64 ones may be due to slow machine.

Yeah, sending a SIGQUIT at timeout is probably a good idea.

@aclements
Copy link
Member

I have CL 370665 to apply timeouts to nearly every subprocess invocation in the runtime test (though wasn't planning to land that until the tree opens). These failures are all in cmd/link or cmd/link/internal/ld. I could roll a CL to use RunWithTimeout in those tests.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 7, 2022

Looks like the same failure mode in go/internal/gcimporter too: https://build.golang.org/log/0e1b9a393109ba16005d18ff9faca47c50728a8f

goroutine 167 [syscall, 2 minutes]:
syscall.Syscall6(0x1a4b, 0x8853cf0, 0x0, 0x8d7c320, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/syscall/asm_unix_386.s:43 +0x5 fp=0x8853c9c sp=0x8853c98 pc=0x80b4015
syscall.wait4(0x1a4b, 0x8853cf0, 0x0, 0x8d7c320)
	/tmp/workdir/go/src/syscall/zsyscall_netbsd_386.go:34 +0x5b fp=0x8853cd4 sp=0x8853c9c pc=0x80b28db
syscall.Wait4(0x1a4b, 0x8853d14, 0x0, 0x8d7c320)
	/tmp/workdir/go/src/syscall/syscall_bsd.go:144 +0x3b fp=0x8853cf8 sp=0x8853cd4 pc=0x80b269b
os.(*Process).wait(0x89d1530)
	/tmp/workdir/go/src/os/exec_unix.go:43 +0x82 fp=0x8853d2c sp=0x8853cf8 pc=0x80c9d32
os.(*Process).Wait(...)
	/tmp/workdir/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x89549a0)
	/tmp/workdir/go/src/os/exec/exec.go:507 +0x4d fp=0x8853d70 sp=0x8853d2c pc=0x8139b7d
os/exec.(*Cmd).Run(0x89549a0)
	/tmp/workdir/go/src/os/exec/exec.go:341 +0x43 fp=0x8853d80 sp=0x8853d70 pc=0x8138f63
os/exec.(*Cmd).CombinedOutput(0x89549a0)
	/tmp/workdir/go/src/os/exec/exec.go:567 +0x89 fp=0x8853d94 sp=0x8853d80 pc=0x813a049
go/internal/gcimporter_test.compile(0x8801b30, {0x88c4300, 0x1e}, {0x88aaa80, 0xe}, {0x89d1500, 0x27})
	/tmp/workdir/go/src/go/internal/gcimporter/gcimporter_test.go:50 +0x28d fp=0x8853e18 sp=0x8853d94 pc=0x822216d
go/internal/gcimporter_test.TestImportTypeparamTests.func1(0x8801b30)
	/tmp/workdir/go/src/go/internal/gcimporter/gcimporter_test.go:201 +0x406 fp=0x8853f9c sp=0x8853e18 pc=0x8223a86
testing.tRunner(0x8801b30, 0x8970ca0)
	/tmp/workdir/go/src/testing/testing.go:1440 +0x10d fp=0x8853fe4 sp=0x8853f9c pc=0x8109a0d
testing.(*T).Run.func1()
	/tmp/workdir/go/src/testing/testing.go:1487 +0x28 fp=0x8853ff0 sp=0x8853fe4 pc=0x810a7e8
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_386.s:1311 +0x1 fp=0x8853ff4 sp=0x8853ff0 pc=0x80a8a51
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:1487 +0x36e

@bcmills
Copy link
Contributor Author

bcmills commented Mar 10, 2022

greplogs --dashboard -md -l -e 'panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait(?:.*\n)+FAIL\s+cmd/link' --since=2022-01-08

2022-03-10T09:12:04-5a040c5/netbsd-amd64-9_0

@bcmills bcmills changed the title cmd/link: tests hanging in os.(*Process).Wait on netbsd builders os: (*Process).Wait sometimes hangs on netbsd Apr 6, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Apr 6, 2022

Broadening the regexp to search for os.(*Process).Wait generally, since this symptom does not appear to be specific to cmd/link as far as I can see.

greplogs --dashboard -md -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-01-01

@bcmills
Copy link
Contributor Author

bcmills commented Apr 19, 2022

greplogs --dashboard -md -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-04-06

2022-04-18T22:07:54-f49e802/netbsd-arm-bsiegert

  • 29 minutes, stuck in cmd/link/internal/ld.TestMemProfileCheck on go run of a test program that prints runtime.MemProfileRate and then exits.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 19, 2022

@bsiegert, @coypoop: given that os/exec is used pervasively, I can't easily filter out these failures during dashboard triage. I also don't see how Go on NetBSD could be usable in a production setting with what appears to be a pervasive deadlock in such a fundamental API. Is there something that can be done to move this forward on the NetBSD side?

@bsiegert
Copy link
Contributor

/cc @zoulasc @tklauser

@coypoop
Copy link
Contributor

coypoop commented Apr 20, 2022

The arm and arm64 ones may be due to slow machine.

Is this still a suspicion? I assume the netbsd/arm builder is extra slow.
I didn't try this, but netbsd/arm64's 32-bit compat might be sufficient for running arm32 tests.

@riastradh
Copy link

Do you have steps to reproduce?

I'm having a little trouble following the initial report, because it seems to cover several operating systems and architectures. Does this happen every time on NetBSD, or on NetBSD/arm, or only sometimes, or what?

If it happens only sometimes, how long does it take successful test runs on the machines where it fails?

@bcmills
Copy link
Contributor Author

bcmills commented Apr 20, 2022

The arm and arm64 ones may be due to slow machine.

Is this still a suspicion? I assume the netbsd/arm builder is extra slow.

I can't speak for @cherrymui, but given the similar failures on the -amd64 and -386 builders I believe this is likely an architecture-independent synchronization bug in either the NetBSD kernel or the Go os implementation for NetBSD.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 20, 2022

Do you have steps to reproduce?

Unfortunately no. The failures listed above were found organically in the Go build dashboard — the repro rate is high enough to be significant but not high enough to reproduce on demand.

I'm having a little trouble following the initial report, because it seems to cover several operating systems and architectures.

The freebsd failure in 2021-04-29T15:47:16-12eaefe was likely #46272, and hasn't been seen since that was addressed.
The linux-ppc64le failure was from 2019, and may have been an organic timeout. (It hasn't been a recurring pattern.)
The remainder AFAICT are on NetBSD, on varying architectures.

Does this happen every time on NetBSD, or on NetBSD/arm, or only sometimes, or what?

Intermittently, on NetBSD across all of the architectures for which we have builders.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 20, 2022

(Note that we also tried to use wait6 for this on NetBSD, but had to roll it back because of other deadlocks on that OS — see #48789.)

@bcmills
Copy link
Contributor Author

bcmills commented Apr 25, 2022

greplogs --dashboard -md -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-04-19

@bcmills
Copy link
Contributor Author

bcmills commented May 13, 2022

greplogs --dashboard -md -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-04-23
2022-05-12T22:29:02-da0a6f4/netbsd-arm-bsiegert
2022-05-12T20:19:10-27ace7a/netbsd-arm-bsiegert

Very curious that these recent ones seem to occur in pairs. 🤔

(attn @golang/netbsd)

@bcmills
Copy link
Contributor Author

bcmills commented May 16, 2022

Then again, the pairings might just be a coincidence.

greplogs --dashboard -md -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-05-13
2022-05-13T19:45:43-ba8310c/netbsd-amd64-9_0

@bcmills
Copy link
Contributor Author

bcmills commented May 31, 2022

greplogs -l -e '\Anetbsd-(.*\n)*panic: test timed out.*\n\n(?:goroutine .*:\n(?:.+\n\t.+\n)+\n)*goroutine \d+ \[syscall, \d+ minutes\]:\n(?:.+\n\t.+\n)*os\.\(\*Process\)\.Wait' --since=2022-05-14
2022-05-27T14:57:14-590b53f/netbsd-amd64-9_0 (on release-branch.go1.17)
2022-05-17T03:26:28-41b9d8c/netbsd-arm64-bsiegert (on the main branch)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/409595 mentions this issue: dashboard: mark all current netbsd builders as affected by golang/go#50138

gopherbot pushed a commit to golang/build that referenced this issue May 31, 2022
…50138

Since a large fraction of Go tests invoke commands, this issue causes
noise on the builders that cannot be easily bypassed or filtered out.

Failures matching this issue have been observed on all four of the
current NetBSD builders. (The last such failure observed on a
non-NetBSD builder was on freebsd-amd64-11_4, and that builder is no
longer used; no matching failures have been observed on more recent
FreeBSD builders.)

Updates golang/go#50138.

Change-Id: Ied687a63a55407d19c5f1905e79111d302087937
Reviewed-on: https://go-review.googlesource.com/c/build/+/409595
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/442575 mentions this issue: os: split wait6 syscall wrapper into per-platform files

gopherbot pushed a commit that referenced this issue Oct 12, 2022
Dragonfly and FreeBSD both used numerical values for these constants
chosen to be the same as on Solaris. For some reason, NetBSD did not,
and happens to interpret value 0 as P_ALL instead of P_PID
(see https://github.com/NetBSD/src/blob/3323ceb7822f98b3d2693aa26fd55c4ded6d8ba4/sys/sys/idtype.h#L43-L44).

Using the correct value for P_PID should cause wait6 to wait for the
correct process, which may help to avoid the deadlocks reported in

For #50138.
Updates #13987.

Change-Id: I0eacd1faee4a430d431fe48f9ccf837f49c42f39
Reviewed-on: https://go-review.googlesource.com/c/go/+/442478
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 12, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Oct 12, 2022

https://go.dev/cl/442478 seems like a plausible fix for this issue. Waiting to see if there's more.

The TryBots for that change hit a deadlock that looks plausibly related to the missed fprintf wakeup from the C program I tried yesterday (#50138 (comment)). If that's all that is left I suggest that we report it as a separate issue, since it has a somewhat different symptom.

@bcmills
Copy link
Contributor Author

bcmills commented Oct 12, 2022

I've filed the new failure mode separately as #56180.

@riastradh
Copy link

I left this attempt at a C reproducer running overnight on a netbsd-amd64-9_0 gomote instance, but by the morning it ended up deadlocked in fprintf instead of seeing a spurious return from wait6. Apparently there is also some kind of rare race condition in NetBSD's fprintf that causes a missed wakeup on a condition variable? 😅

The fprintf issue you hit may be a bug in the interaction between libpthread and the dynamic loader rtld which we have since fixed in HEAD and netbsd-9 (but not in netbsd-9.0):

https://nxr.netbsd.org/xref/src/lib/libpthread/pthread.c?r=1.181#418

Please let us know if you can still reproduce it on a current system.

https://go.dev/cl/442478 seems like a plausible fix for this issue. Waiting to see if there's more.

Surely that wouldn't affect the deadlock you saw with wait4, would it? If Go uses wait4 instead of wait6, do you still see the deadlock? (Forgive me if I missed something -- there have been a lot of updates in quick succession which I didn't follow all of.)

@bcmills
Copy link
Contributor Author

bcmills commented Oct 12, 2022

The wait4 version of the deadlock is plausibly #13987, depending how quickly NetBSD recycles PIDs.

@bcmills
Copy link
Contributor Author

bcmills commented Oct 12, 2022

Updating to a more recent NetBSD release is #54773.

@riastradh
Copy link

Updating to a more recent NetBSD release is #54773.

Unfortunately the libpthread/rtld fix didn't make it into 9.3! I didn't realize it until a couple days after 9.3 went out, sorry. But if you can reproduce the C program's fprintf hang, just a regular NetBSD install on a VM without all the golang test harness, that would be helpful -- especially if you can do it on a system with the debug.tgz (or debug.tar.xz) set installed so we get full stack traces. I'll see if I can reproduce it, but half an hour of running it, both on a NetBSD<=9.3 library without the fix and on a current library with the fix, hasn't turned anything up yet.

@bsiegert
Copy link
Contributor

bsiegert commented Oct 13, 2022 via email

gopherbot pushed a commit that referenced this issue Oct 13, 2022
This will dump more goroutines if the test happens to fail.

For #50138.

Change-Id: Ifae30b5ba8bddcdaa9250dd90be8d8ba7d5604d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/442476
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
gopherbot pushed a commit that referenced this issue Oct 13, 2022
If we use the "pipetest" helper command instead of "sleep",
we can use its stdout pipe to determine when the process
is ready to handle a SIGSTOP, and we can additionally check
that sending a SIGCONT actually causes the process to continue.

This also allows us to remove the "sleep" helper command,
making the test file somewhat more concise.

Noticed while looking into #50138.

Change-Id: If4fdee4b1ddf28c6ed07ec3268c81b73c2600238
Reviewed-on: https://go-review.googlesource.com/c/go/+/442576
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
gopherbot pushed a commit that referenced this issue Oct 13, 2022
There are getting to be enough special cases in this wrapper that
the increase in clarity from having a single file is starting to be
outweighed by the complexity from chained conditionals.

Updates #50138.
Updates #13987.

Change-Id: If4f1be19c0344e249aa6092507c28363ca6c8438
Reviewed-on: https://go-review.googlesource.com/c/go/+/442575
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 14, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Oct 14, 2022

The new failure mode in #56180 is consistent with the aforementioned libpthread bug, and the remaining failures on the builders after CL 442478 all seem to plausibly match that failure mode.

I'm going to close this issue as “one bug fixed”, with the remaining hang tracked in #56180. (If there are other failure modes after the builders are upgraded in #54773, we can open new issues for those.)

@bcmills bcmills closed this as completed Oct 14, 2022
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
Dragonfly and FreeBSD both used numerical values for these constants
chosen to be the same as on Solaris. For some reason, NetBSD did not,
and happens to interpret value 0 as P_ALL instead of P_PID
(see https://github.com/NetBSD/src/blob/3323ceb7822f98b3d2693aa26fd55c4ded6d8ba4/sys/sys/idtype.h#L43-L44).

Using the correct value for P_PID should cause wait6 to wait for the
correct process, which may help to avoid the deadlocks reported in

For golang#50138.
Updates golang#13987.

Change-Id: I0eacd1faee4a430d431fe48f9ccf837f49c42f39
Reviewed-on: https://go-review.googlesource.com/c/go/+/442478
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
This will dump more goroutines if the test happens to fail.

For golang#50138.

Change-Id: Ifae30b5ba8bddcdaa9250dd90be8d8ba7d5604d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/442476
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
If we use the "pipetest" helper command instead of "sleep",
we can use its stdout pipe to determine when the process
is ready to handle a SIGSTOP, and we can additionally check
that sending a SIGCONT actually causes the process to continue.

This also allows us to remove the "sleep" helper command,
making the test file somewhat more concise.

Noticed while looking into golang#50138.

Change-Id: If4fdee4b1ddf28c6ed07ec3268c81b73c2600238
Reviewed-on: https://go-review.googlesource.com/c/go/+/442576
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
There are getting to be enough special cases in this wrapper that
the increase in clarity from having a single file is starting to be
outweighed by the complexity from chained conditionals.

Updates golang#50138.
Updates golang#13987.

Change-Id: If4f1be19c0344e249aa6092507c28363ca6c8438
Reviewed-on: https://go-review.googlesource.com/c/go/+/442575
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/454755 mentions this issue: dashboard: remove NetBSD 9.0 builders

gopherbot pushed a commit to golang/build that referenced this issue Dec 2, 2022
The netbsd-386 and netbsd-amd64 builders with 9.3 contain various
bugfixes, including to libpthread, that prevent test flakes. Remove the
older version now.

While here, remove issue golang/go#50138 (fixed) from the netbsd-arm*
builders.

Fixes golang/go#54773.

Change-Id: Ibccf0817a69a3dd74651bd5a3f50ab77c3a92beb
Reviewed-on: https://go-review.googlesource.com/c/build/+/454755
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
@golang golang locked and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Projects
None yet
Development

No branches or pull requests

9 participants