Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Run unit tests with -race #8329

Open
8 of 10 tasks
dnephin opened this issue Jul 17, 2020 · 5 comments · Fixed by #8346
Open
8 of 10 tasks

ci: Run unit tests with -race #8329

dnephin opened this issue Jul 17, 2020 · 5 comments · Fixed by #8346
Labels
theme/reliability type/ci Relating to continuous integration (CI) tooling for testing or releases

Comments

@dnephin
Copy link
Contributor

dnephin commented Jul 17, 2020

I've attempted to enable 'go test -race', but there are many failures (example). Data races could cause flaky tests, and may even cause issues in production.

I'm starting work on this in #8269. This issue is to track packages that have failures so that we can fix them incrementally.

Related issues:

@dnephin dnephin added the type/ci Relating to continuous integration (CI) tooling for testing or releases label Jul 17, 2020
@dnephin
Copy link
Contributor Author

dnephin commented Jul 20, 2020

I'm seeing some of the TestCheck* tests in agent/checks fail on master due to races (https://circleci.com/gh/hashicorp/consul/209863, https://circleci.com/gh/hashicorp/consul/209838).

Some of the stacks are missing (I think that is due to another bug in Go that I patched and will be in go1.15). I tried to reproduce locally and it appears to be a race with t.Log and the test shutting down. I guess the goroutines outlive the test, so maybe the root cause is a leaked go routine.

I think #7948 may fix the immediate problem with the race.

stack traces
WARNING: DATA RACE
Read at 0x00c000496b83 by goroutine 28:
  testing.(*common).logDepth()
      /usr/lib/go/src/testing/testing.go:675 +0x14b
  testing.(*common).log()
      /usr/lib/go/src/testing/testing.go:662 +0x77
  testing.(*common).Log()
      /usr/lib/go/src/testing/testing.go:694 +0x1d
  testing.(*T).Log()
      <autogenerated>:1 +0x61
  github.com/hashicorp/consul/sdk/testutil.(*testWriter).Write()
      /home/daniel/pers/code/consul/sdk/testutil/testlog.go:86 +0x319
  github.com/hashicorp/go-hclog.(*writer).Flush()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/writer.go:29 +0x28c
  github.com/hashicorp/go-hclog.(*intLogger).log()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/intlogger.go:140 +0x20c
  github.com/hashicorp/go-hclog.(*intLogger).Debug()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/intlogger.go:442 +0x9f
  github.com/hashicorp/go-hclog.(*interceptLogger).Debug()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/interceptlogger.go:62 +0xaf
  github.com/hashicorp/consul/agent/checks.(*StatusHandler).updateCheck()
      /home/daniel/pers/code/consul/agent/checks/check.go:828 +0x2d6
  github.com/hashicorp/consul/agent/checks.(*CheckHTTP).check()
      /home/daniel/pers/code/consul/agent/checks/check.go:489 +0x88b
  github.com/hashicorp/consul/agent/checks.(*CheckHTTP).run()
      /home/daniel/pers/code/consul/agent/checks/check.go:423 +0xac

Previous write at 0x00c000496b83 by goroutine 18:
  testing.tRunner.func1()
      /usr/lib/go/src/testing/testing.go:977 +0x467
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:995 +0x20d

Goroutine 28 (running) created at:
  github.com/hashicorp/consul/agent/checks.(*CheckHTTP).Start()
      /home/daniel/pers/code/consul/agent/checks/check.go:402 +0x170
  github.com/hashicorp/consul/agent/checks.TestCheckHTTPBody.func2()
      /home/daniel/pers/code/consul/agent/checks/check_test.go:639 +0x700
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:991 +0x1eb

Goroutine 18 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1042 +0x660
  testing.runTests.func1()
      /usr/lib/go/src/testing/testing.go:1284 +0xa6
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:991 +0x1eb
  testing.runTests()
      /usr/lib/go/src/testing/testing.go:1282 +0x527
  testing.(*M).Run()
      /usr/lib/go/src/testing/testing.go:1199 +0x2ff
  github.com/hashicorp/consul/agent/checks.TestMain()
      /home/daniel/pers/code/consul/agent/checks/grpc_test.go:71 +0x5f8
  main.main()
      _testmain.go:112 +0x223

@dnephin
Copy link
Contributor Author

dnephin commented Jul 20, 2020

agent/router TestMain appears to be failing sometimes. CI missing the output (maybe another test2json bug?), but I was able to reproduce locally:

stack traces
==================
WARNING: DATA RACE
Read at 0x00c0000fb003 by goroutine 134:
  testing.(*common).logDepth()
      /usr/lib/go/src/testing/testing.go:669 +0xa1
  testing.(*common).log()
      /usr/lib/go/src/testing/testing.go:662 +0x77
  testing.(*common).Log()
      /usr/lib/go/src/testing/testing.go:694 +0x1d
  testing.(*T).Log()
      <autogenerated>:1 +0x61
  github.com/hashicorp/consul/sdk/testutil.(*testWriter).Write()
      /home/daniel/pers/code/consul/sdk/testutil/testlog.go:86 +0x319
  github.com/hashicorp/go-hclog.(*writer).Flush()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/writer.go:29 +0x28c
  github.com/hashicorp/go-hclog.(*intLogger).log()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/intlogger.go:140 +0x20c
  github.com/hashicorp/go-hclog.(*intLogger).Info()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/intlogger.go:452 +0x9f
  github.com/hashicorp/go-hclog.(*interceptLogger).Info()
      /home/daniel/go/pkg/mod/github.com/hashicorp/go-hclog@v0.12.0/interceptlogger.go:76 +0xaf
  github.com/hashicorp/consul/agent/router.(*Manager).Start()
      /home/daniel/pers/code/consul/agent/router/manager.go:526 +0x1bc

Previous write at 0x00c0000fb003 by goroutine 22:
  testing.tRunner.func1()
      /usr/lib/go/src/testing/testing.go:977 +0x467
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:995 +0x20d

Goroutine 134 (running) created at:
  github.com/hashicorp/consul/agent/router.(*Router).addServer()
      /home/daniel/pers/code/consul/agent/router/router.go:252 +0x613
  github.com/hashicorp/consul/agent/router.(*Router).AddArea()
      /home/daniel/pers/code/consul/agent/router/router.go:161 +0x66c
  github.com/hashicorp/consul/agent/router.TestRouter_Routing()
      /home/daniel/pers/code/consul/agent/router/router_test.go:172 +0x52e
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:991 +0x1eb

Goroutine 22 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1042 +0x660
  testing.runTests.func1()
      /usr/lib/go/src/testing/testing.go:1284 +0xa6
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:991 +0x1eb
  testing.runTests()
      /usr/lib/go/src/testing/testing.go:1282 +0x527
  testing.(*M).Run()
      /usr/lib/go/src/testing/testing.go:1199 +0x2ff
  main.main()
      _testmain.go:88 +0x223
==================

@dnephin dnephin reopened this Jul 21, 2020
@pierresouchay
Copy link
Contributor

pierresouchay commented Jul 21, 2020

checks is failing a lot as well it seems, on my machine, on 2nd/3rd run, this fails on master branch:

date; i=0;while go test -p 2 -race -gcflags=all=-d=checkptr=0 -timeout 60s github.com/hashicorp/consul/agent/checks; do go clean -testcache; i=$((i + 1)); done; echo failed at $i; date

@dnephin
Copy link
Contributor Author

dnephin commented Jul 21, 2020

That last PR (#8346) should have fixed checks for HTTP and Alias. Are you seeing the other ones fail as well? I commented in that PR description that I think the problem exists in all the Check types, but I hadn't seen the rest fail yet, so I wanted to get a quick fix merged to get things green.

I've been using go test -race -count=50 to reproduce the flakes, which also seems to work

@pierresouchay
Copy link
Contributor

indeed with the last patch I cannot reproduce anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/reliability type/ci Relating to continuous integration (CI) tooling for testing or releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants