Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in TestRaftClusterMultipleRestart #7391

Closed
rleungx opened this issue Nov 20, 2023 · 4 comments · Fixed by #7392, #7396 or #7370
Closed

Data race in TestRaftClusterMultipleRestart #7391

rleungx opened this issue Nov 20, 2023 · 4 comments · Fixed by #7392, #7396 or #7370
Labels
type/ci The issue is related to CI.

Comments

@rleungx
Copy link
Member

rleungx commented Nov 20, 2023

Flaky Test

Which jobs are failing

2023-11-20T03:16:11.4068307Z ==================
2023-11-20T03:16:11.4068406Z WARNING: DATA RACE
2023-11-20T03:16:11.4068553Z Read at 0x00c003eff730 by goroutine 15345:
2023-11-20T03:16:11.4068852Z   github.com/tikv/pd/server/cluster.(*RaftCluster).GetBasicCluster()
2023-11-20T03:16:11.4069175Z       /home/runner/work/pd/pd/server/cluster/cluster.go:1090 +0x8b
2023-11-20T03:16:11.4069425Z   github.com/tikv/pd/pkg/schedule.(*Coordinator).ShouldRun()
2023-11-20T03:16:11.4069754Z       /home/runner/work/pd/pd/pkg/schedule/coordinator.go:724 +0xa8
2023-11-20T03:16:11.4069961Z   github.com/tikv/pd/pkg/schedule.(*Coordinator).Run()
2023-11-20T03:16:11.4070289Z       /home/runner/work/pd/pd/pkg/schedule/coordinator.go:389 +0x211
2023-11-20T03:16:11.4070540Z   github.com/tikv/pd/pkg/schedule.(*Coordinator).RunUntilStop()
2023-11-20T03:16:11.4070864Z       /home/runner/work/pd/pd/pkg/schedule/coordinator.go:372 +0x93
2023-11-20T03:16:11.4071196Z   github.com/tikv/pd/server/cluster.(*schedulingController).runCoordinator()
2023-11-20T03:16:11.4071609Z       /home/runner/work/pd/pd/server/cluster/scheduling_controller.go:120 +0x108
2023-11-20T03:16:11.4072107Z   github.com/tikv/pd/server/cluster.(*schedulingController).startSchedulingJobs.func2()
2023-11-20T03:16:11.4072512Z       /home/runner/work/pd/pd/server/cluster/scheduling_controller.go:97 +0x33
2023-11-20T03:16:11.4072521Z 
2023-11-20T03:16:11.4072694Z Previous write at 0x00c003eff730 by goroutine 3113:
2023-11-20T03:16:11.4072958Z   github.com/tikv/pd/server/cluster.(*RaftCluster).InitCluster()
2023-11-20T03:16:11.4073268Z       /home/runner/work/pd/pd/server/cluster/cluster.go:265 +0x109
2023-11-20T03:16:11.4073485Z   github.com/tikv/pd/server/cluster.(*RaftCluster).Start()
2023-11-20T03:16:11.4073909Z       /home/runner/work/pd/pd/server/cluster/cluster.go:295 +0x314
2023-11-20T03:16:11.4074495Z   github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
2023-11-20T03:16:11.4074894Z       /home/runner/work/pd/pd/tests/server/cluster/cluster_test.go:517 +0x55e
2023-11-20T03:16:11.4075054Z   github.com/pingcap/failpoint.parseTerm()
2023-11-20T03:16:11.4075743Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:149 +0x364
2023-11-20T03:16:11.4075888Z   github.com/pingcap/failpoint.parse()
2023-11-20T03:16:11.4076541Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:126 +0xa5
2023-11-20T03:16:11.4076701Z   github.com/pingcap/failpoint.newTerms()
2023-11-20T03:16:11.4077341Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/terms.go:98 +0x3e
2023-11-20T03:16:11.4077541Z   github.com/pingcap/failpoint.(*Failpoint).Enable()
2023-11-20T03:16:11.4078213Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoint.go:54 +0x3e
2023-11-20T03:16:11.4078418Z   github.com/pingcap/failpoint.(*Failpoints).Enable()
2023-11-20T03:16:11.4079116Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:105 +0x276
2023-11-20T03:16:11.4079265Z   github.com/pingcap/failpoint.Enable()
2023-11-20T03:16:11.4079955Z       /home/runner/go/pkg/mod/github.com/pingcap/failpoint@v0.0.0-20210918120811-547c13e3eb00/failpoints.go:225 +0x4da
2023-11-20T03:16:11.4080311Z   github.com/tikv/pd/tests/server/cluster_test.TestRaftClusterMultipleRestart()
2023-11-20T03:16:11.4080711Z       /home/runner/work/pd/pd/tests/server/cluster/cluster_test.go:515 +0x4db
2023-11-20T03:16:11.4080825Z   testing.tRunner()
2023-11-20T03:16:11.4081196Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:1595 +0x238
2023-11-20T03:16:11.4081316Z   testing.(*T).Run.func1()
2023-11-20T03:16:11.4081665Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:1648 +0x44
2023-11-20T03:16:11.4081678Z 
2023-11-20T03:16:11.4081800Z Goroutine 15345 (running) created at:
2023-11-20T03:16:11.4082160Z   github.com/tikv/pd/server/cluster.(*schedulingController).startSchedulingJobs()
2023-11-20T03:16:11.4082579Z       /home/runner/work/pd/pd/server/cluster/scheduling_controller.go:97 +0x219
2023-11-20T03:16:11.4082847Z   github.com/tikv/pd/server/cluster.(*RaftCluster).checkServices()
2023-11-20T03:16:11.4083164Z       /home/runner/work/pd/pd/server/cluster/cluster.go:371 +0x397
2023-11-20T03:16:11.4083460Z   github.com/tikv/pd/server/cluster.(*RaftCluster).runServiceCheckJob()
2023-11-20T03:16:11.4083769Z       /home/runner/work/pd/pd/server/cluster/cluster.go:393 +0x29e
2023-11-20T03:16:11.4084026Z   github.com/tikv/pd/server/cluster.(*RaftCluster).Start.func2()
2023-11-20T03:16:11.4084328Z       /home/runner/work/pd/pd/server/cluster/cluster.go:338 +0x33
2023-11-20T03:16:11.4084339Z 
2023-11-20T03:16:11.4084467Z Goroutine 3113 (running) created at:
2023-11-20T03:16:11.4084573Z   testing.(*T).Run()
2023-11-20T03:16:11.4084929Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:1648 +0x82a
2023-11-20T03:16:11.4085050Z   testing.runTests.func1()
2023-11-20T03:16:11.4085396Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:2054 +0x84
2023-11-20T03:16:11.4085653Z   testing.tRunner()
2023-11-20T03:16:11.4086006Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:1595 +0x238
2023-11-20T03:16:11.4086120Z   testing.runTests()
2023-11-20T03:16:11.4086463Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:2052 +0x896
2023-11-20T03:16:11.4086574Z   testing.(*M).Run()
2023-11-20T03:16:11.4086908Z       /opt/hostedtoolcache/go/1.21.3/x64/src/testing/testing.go:1925 +0xb57
2023-11-20T03:16:11.4087007Z   main.main()
2023-11-20T03:16:11.4087170Z       _testmain.go:127 +0x2e4
2023-11-20T03:16:11.4087258Z ==================

CI link

https://github.com/tikv/pd/actions/runs/6925032433/job/18835104284

Reason for failure (if possible)

Anything else

@rleungx rleungx added the type/ci The issue is related to CI. label Nov 20, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 20, 2023
close #7391

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@lhy1024
Copy link
Contributor

lhy1024 commented Nov 20, 2023

@lhy1024 lhy1024 reopened this Nov 20, 2023
@rleungx
Copy link
Member Author

rleungx commented Nov 20, 2023

The test is not right, it will be fixed by #7396.

ti-chi-bot bot pushed a commit that referenced this issue Nov 20, 2023
close #7391, close #7393, close #7394

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@lhy1024
Copy link
Contributor

lhy1024 commented Nov 20, 2023

again
https://github.com/tikv/pd/actions/runs/6928481549/job/18844364222?pr=7370

2023-11-20T10:13:53.0958820Z --- FAIL: TestRaftClusterMultipleRestart (3.65s)
2023-11-20T10:13:53.0959156Z     testing.go:1465: race detected during execution of test

@rleungx
Copy link
Member Author

rleungx commented Nov 21, 2023

The reason is after the scheduling controller is stopped, there is a gap that startSchedulingJobs will be run again before runServiceCheckJob exists.

ti-chi-bot bot pushed a commit that referenced this issue Nov 21, 2023
ref #5839, close #7391

Signed-off-by: Ryan Leung <rleungx@gmail.com>
rleungx added a commit to rleungx/pd that referenced this issue Dec 1, 2023
close tikv#7391

Signed-off-by: Ryan Leung <rleungx@gmail.com>
rleungx added a commit to rleungx/pd that referenced this issue Dec 1, 2023
close tikv#7391, close tikv#7393, close tikv#7394

Signed-off-by: Ryan Leung <rleungx@gmail.com>
rleungx added a commit to rleungx/pd that referenced this issue Dec 1, 2023
ref tikv#5839, close tikv#7391

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
2 participants