Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix data race in the suit #1165

Merged
merged 2 commits into from
May 4, 2022

Conversation

hawkingrei
Copy link
Contributor

@hawkingrei hawkingrei commented Mar 3, 2022

Signed-off-by: Weizhen Wang wangweizhen@pingcap.com

Summary

we find a data race in the suit. so we think Mutex is added to the suit for solving it.

ref pingcap/tidb#32747

==================
WARNING: DATA RACE
Write at 0x00c026a29588 by goroutine 117:
  github.com/stretchr/testify/suite.(*Suite).SetT()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:37 +0x148
  github.com/pingcap/tidb/executor_test.(*infosSchemaClusterTableSuite).SetT()
      <autogenerated>:1 +0x44
  github.com/stretchr/testify/suite.Run.func1()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:128 +0xcc
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47
Previous read at 0x00c026a29588 by goroutine 209:
  github.com/stretchr/testify/suite.(*Suite).Require()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:42 +0x5a
  github.com/pingcap/tidb/executor_test.(*infosSchemaClusterTableSuite).setUpRPCService.func1()
      /go/tidb/executor/infoschema_cluster_table_test.go:86 +0x46
Goroutine 117 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1306 +0x726
  github.com/stretchr/testify/suite.runTests()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:203 +0x18f
  github.com/stretchr/testify/suite.Run()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:176 +0x994
  github.com/pingcap/tidb/executor_test.TestInfoSchemaClusterTable()
      /go/tidb/executor/infoschema_cluster_table_test.go:57 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47
Goroutine 209 (running) created at:
  github.com/pingcap/tidb/executor_test.(*infosSchemaClusterTableSuite).setUpRPCService()
      /go/tidb/executor/infoschema_cluster_table_test.go:85 +0x5d2
  github.com/pingcap/tidb/executor_test.(*infosSchemaClusterTableSuite).SetupSuite()
      /go/tidb/executor/infoschema_cluster_table_test.go:62 +0x184
  github.com/stretchr/testify/suite.Run()
      /home/prow/go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:118 +0x5b7
  github.com/pingcap/tidb/executor_test.TestInfoSchemaClusterTable()
      /go/tidb/executor/infoschema_cluster_table_test.go:57 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47
================== 

Changes

add Mutex to the suit.

Motivation

fix data race.

Related issues

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Contributor Author

@boyan-soubachov PTAL

@lance6716
Copy link

another occurrence

[2022-03-17T11:45:36.552Z] WARNING: DATA RACE
[2022-03-17T11:45:36.552Z] Read at 0x00c0010c1350 by goroutine 239:
[2022-03-17T11:45:36.552Z]   github.com/pingcap/tiflow/dm/dm/master/scheduler.(*testSchedulerSuite).TestUpdateSubTasksAndSourceCfg.func1()
[2022-03-17T11:45:36.552Z]       /home/jenkins/agent/workspace/ut-check/tiflow/dm/dm/master/scheduler/scheduler_test.go:2089 +0xa5
[2022-03-17T11:45:36.552Z] 
[2022-03-17T11:45:36.552Z] Previous write at 0x00c0010c1350 by goroutine 199:
[2022-03-17T11:45:36.552Z]   github.com/stretchr/testify/suite.(*Suite).SetT()
[2022-03-17T11:45:36.552Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:36 +0xce
[2022-03-17T11:45:36.552Z]   github.com/pingcap/tiflow/dm/dm/master/scheduler.(*testSchedulerSuite).SetT()
[2022-03-17T11:45:36.552Z]       <autogenerated>:1 +0x29
[2022-03-17T11:45:36.552Z]   github.com/stretchr/testify/suite.Run.func1.1()
[2022-03-17T11:45:36.552Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:144 +0x2f4
[2022-03-17T11:45:36.552Z]   runtime.deferreturn()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/runtime/panic.go:436 +0x32
[2022-03-17T11:45:36.552Z]   testing.tRunner()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1439 +0x213
[2022-03-17T11:45:36.552Z]   testing.(*T).Run.func1()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1486 +0x47
[2022-03-17T11:45:36.552Z] 
[2022-03-17T11:45:36.552Z] Goroutine 239 (running) created at:
[2022-03-17T11:45:36.552Z]   github.com/pingcap/tiflow/dm/dm/master/scheduler.(*testSchedulerSuite).TestUpdateSubTasksAndSourceCfg()
[2022-03-17T11:45:36.552Z]       /home/jenkins/agent/workspace/ut-check/tiflow/dm/dm/master/scheduler/scheduler_test.go:2088 +0xda4
[2022-03-17T11:45:36.552Z]   runtime.call16()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/runtime/asm_amd64.s:701 +0x48
[2022-03-17T11:45:36.552Z]   reflect.Value.Call()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/reflect/value.go:339 +0xd7
[2022-03-17T11:45:36.552Z]   github.com/stretchr/testify/suite.Run.func1()
[2022-03-17T11:45:36.552Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:158 +0x6dc
[2022-03-17T11:45:36.552Z]   testing.tRunner()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1439 +0x213
[2022-03-17T11:45:36.552Z]   testing.(*T).Run.func1()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1486 +0x47
[2022-03-17T11:45:36.552Z] 
[2022-03-17T11:45:36.552Z] Goroutine 199 (finished) created at:
[2022-03-17T11:45:36.552Z]   testing.(*T).Run()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1486 +0x724
[2022-03-17T11:45:36.552Z]   github.com/stretchr/testify/suite.runTests()
[2022-03-17T11:45:36.552Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:203 +0x198
[2022-03-17T11:45:36.552Z]   github.com/stretchr/testify/suite.Run()
[2022-03-17T11:45:36.552Z]       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:176 +0x969
[2022-03-17T11:45:36.552Z]   github.com/pingcap/tiflow/dm/dm/master/scheduler.TestSchedulerSuite()
[2022-03-17T11:45:36.552Z]       /home/jenkins/agent/workspace/ut-check/tiflow/dm/dm/master/scheduler/scheduler_test.go:51 +0x44
[2022-03-17T11:45:36.552Z]   testing.tRunner()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1439 +0x213
[2022-03-17T11:45:36.552Z]   testing.(*T).Run.func1()
[2022-03-17T11:45:36.552Z]       /usr/local/go/src/testing/testing.go:1486 +0x47
[2022-03-17T11:45:36.552Z] ==================

@hawkingrei
Copy link
Contributor Author

@boyan-soubachov Hi~ Can you review it for us?

Copy link
Collaborator

@boyan-soubachov boyan-soubachov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your calling code trying to change the suite state concurrently? That seems to be the only way I can think of reproducing this problem.

Do you mind pasting some example code that would cause this issue?

suite/suite.go Outdated
require *require.Assertions
t *testing.T
}

// T retrieves the current *testing.T context.
func (suite *Suite) T() *testing.T {
suite.mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a read-only operation; I'm not sure it needs a mutex?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a sync.RWMutex and suite.mu.RLock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a sync.RWMutex.

@lance6716
Copy link

lance6716 commented Mar 18, 2022

Hi @boyan-soubachov I'm not familiar with go tests, for my case in #1165 (comment)

Previous write at 0x00c0010c1350 by goroutine 199:
...
github.com/stretchr/testify/suite.Run.func1.1()
       /go/pkg/mod/github.com/stretchr/testify@v1.7.0/suite/suite.go:144 +0x2f4

Can it be the the defer of InternalTest.F does not wait test-spawned goroutine createdfinished? I'll take a look soon

@lance6716
Copy link

opened an issue #1168

@hawkingrei
Copy link
Contributor Author

Is your calling code trying to change the suite state concurrently? That seems to be the only way I can think of reproducing this problem.

Do you mind pasting some example code that would cause this issue?

It is shown in the

// SuiteGoroutineRace is intended to test there should be no race when creating
// goroutine in tests.
type SuiteGoroutineRace struct{ Suite }

// TestSuiteRequireTwice...
func TestSuiteGoroutineRace(t *testing.T) {
	ok := testing.RunTests(
		allTestsFilter,
		[]testing.InternalTest{{
			Name: "TestSuiteGoroutineRace",
			F: func(t *testing.T) {
				suite := new(SuiteGoroutineRace)
				Run(t, suite)
			},
		}},
	)
	assert.Equal(t, true, ok)
}

func (s *SuiteGoroutineRace) TestGoroutine() {
	go func() {
		s.Equal(1, 1)
		s.Equal(1, 1)
		s.Equal(1, 1)
	}()
	go func() {
		s.Equal(1, 1)
		s.Equal(1, 1)
		s.Equal(1, 1)
	}()
	go func() {
		s.Equal(1, 1)
		s.Equal(1, 1)
		s.Equal(1, 1)
	}()
}
=== RUN   TestSuiteGoroutineRace
=== RUN   TestSuiteGoroutineRace
=== CONT  TestSuiteGoroutineRace
    testing.go:1152: race detected during execution of test
--- FAIL: TestSuiteGoroutineRace (0.01s)
=== RUN   TestSuiteGoroutineRace/TestGoroutine
==================
WARNING: DATA RACE
Write at 0x00c0000b8150 by goroutine 9:
  github.com/stretchr/testify/suite.(*Suite).SetT()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:36 +0xc4
  github.com/stretchr/testify/suite.(*SuiteGoroutineRace).SetT()
      <autogenerated>:1 +0x44
  github.com/stretchr/testify/suite.Run.func1.1()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:144 +0x2f4
  github.com/stretchr/testify/suite.Run.func1()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:159 +0x71f
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47

Previous read at 0x00c0000b8150 by goroutine 10:
  github.com/stretchr/testify/suite.(*SuiteGoroutineRace).TestGoroutine.func1()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite_test.go:612 +0x36

Goroutine 9 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1306 +0x726
  github.com/stretchr/testify/suite.runTests()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:203 +0x18f
  github.com/stretchr/testify/suite.Run()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:176 +0x994
  github.com/stretchr/testify/suite.TestSuiteGoroutineRace.func1()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite_test.go:603 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47

Goroutine 10 (finished) created at:
  github.com/stretchr/testify/suite.(*SuiteGoroutineRace).TestGoroutine()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite_test.go:611 +0x90
  runtime.call16()
      /usr/local/go/src/runtime/asm_amd64.s:625 +0x48
  reflect.Value.Call()
      /usr/local/go/src/reflect/value.go:339 +0xd7
  github.com/stretchr/testify/suite.Run.func1()
      /home/lance/go/src/github.com/stretchr/testify/suite/suite.go:158 +0x71c
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47
==================
    testing.go:1152: race detected during execution of test
    --- FAIL: TestSuiteGoroutineRace/TestGoroutine (0.00s)

--- FAIL: TestSuiteGoroutineRace (0.01s)


=== CONT  
    testing.go:1152: race detected during execution of test
    suite_test.go:607: 
        	Error Trace:	suite_test.go:607
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestSuiteGoroutineRace
    testing.go:1152: race detected during execution of test
=== CONT  
    testing.go:1152: race detected during execution of test
FAIL

@hawkingrei
Copy link
Contributor Author

Is your calling code trying to change the suite state concurrently? That seems to be the only way I can think of reproducing this problem.

Do you mind pasting some example code that would cause this issue?

the code of case in the PR body is this

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Contributor Author

@lance6716 @boyan-soubachov PTAL

@lance6716
Copy link

@lance6716 @boyan-soubachov PTAL

I think at least you can add an unit test as the comment above, and wait for the opinions of maintainers

@tiancaiamao
Copy link

Is your calling code trying to change the suite state concurrently? That seems to be the only way I can think of reproducing this problem.

Do you mind pasting some example code that would cause this issue?

I guess it's caused by the wrong usage of the testify package?
I have a simple fix here, the Require() function is not designed to be thread-safe?
pingcap/tidb#33837

@tisonkun
Copy link

I have a simple fix here, the Require() function is not designed to be thread-safe?

That's what I've stated in pingcap/tidb#26022 and #187 (comment). suite employs a global state to keep "current *testing.T" and it sucks when run in parallel situation.

@lance6716
Copy link

lance6716 commented Apr 11, 2022

T set to the parent T so the child test failed but reported as passed

this quote is from #187 (comment). IMO if we want to achieve it, we should let main test goroutine in child test wait its spawned goroutine

see my point of view in #1168 (comment)

SaveTheRbtz added a commit to SaveTheRbtz/flow-go that referenced this pull request Apr 11, 2022
SaveTheRbtz added a commit to SaveTheRbtz/flow-go that referenced this pull request Apr 12, 2022
@hawkingrei
Copy link
Contributor Author

@boyan-soubachov PTAL

@boyan-soubachov
Copy link
Collaborator

The Read/Write mutexes are definitely an improvement, thanks!

@lance6716 has a good point about an alternative where we require a pattern from the user but it doesn't seem intuitive or simple.

I'm happy to proceed with @hawkingrei 's code. @lance6716 (and anyone else), any counter-opinions or thoughts on why we shouldn't go with the RWMutex approach?

@lance6716
Copy link

The Read/Write mutexes are definitely an improvement, thanks!

@lance6716 has a good point about an alternative where we require a pattern from the user but it doesn't seem intuitive or simple.

I'm happy to proceed with @hawkingrei 's code. @lance6716 (and anyone else), any counter-opinions or thoughts on why we shouldn't go with the RWMutex approach?

the RWMutex solution seems good to me 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants