Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: fix infinite panic when GRPC transportReader is broken #18437

Merged
merged 7 commits into from
Jul 14, 2020
Merged

tikv: fix infinite panic when GRPC transportReader is broken #18437

merged 7 commits into from
Jul 14, 2020

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Jul 8, 2020

What problem does this PR solve?

Issue Number: refer #18339

Problem Summary:

when panic like #18339 happened, batchClient will recovery from panic without reCreate stream and meet panic again and again...

What is changed and how it works?

What's Changed, How it Works:

although we didn't know how to broke transportReader, we can break infinite panic by covert panic to an error in recv method, then the error will trigger reCreate stream

image

make it recovery from broken transportReader

Related changes

  • Need to cherry-pick to the release branch 4.0

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Side effects

  • n/a

Release note

  • Fix infinite panic when GRPC transportReader is broken

This change is Reviewable

@lysu
Copy link
Contributor Author

lysu commented Jul 8, 2020

/rebuild

@codecov
Copy link

codecov bot commented Jul 9, 2020

Codecov Report

Merging #18437 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18437   +/-   ##
===========================================
  Coverage   79.2752%   79.2752%           
===========================================
  Files           541        541           
  Lines        145295     145295           
===========================================
  Hits         115183     115183           
  Misses        20823      20823           
  Partials       9289       9289           

@tiancaiamao
Copy link
Contributor

LGTM

@ti-srebot
Copy link
Contributor

@tiancaiamao,Thanks for your review.

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 10, 2020
@lysu lysu requested review from jackysp and coocood July 13, 2020 03:20
@lysu lysu added this to the v4.0.3 milestone Jul 14, 2020
@lysu lysu added the priority/release-blocker This issue blocks a release. Please solve it ASAP. label Jul 14, 2020
@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

@jackysp @coocood PTAL thx

@coocood
Copy link
Member

coocood commented Jul 14, 2020

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Jul 14, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Jul 14, 2020
@coocood
Copy link
Member

coocood commented Jul 14, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 14, 2020
@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 18468

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@tiancaiamao
Copy link
Contributor


[2020-07-14T07:42:06.374Z] ==================

[2020-07-14T07:42:06.374Z] WARNING: DATA RACE

[2020-07-14T07:42:06.374Z] Write at 0x00c00066fd88 by goroutine 56:

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/executor.ResetContextOfStmt()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/executor.go:1667 +0xe1d

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/session.(*session).ExecuteStmt()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:1140 +0x1f1

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/session.(*session).Execute()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:1081 +0x3ee

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/session.execRestrictedSQL()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:835 +0x15c

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/session.(*session).ExecRestrictedSQLWithContext()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:774 +0x25b

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/session.(*session).ExecRestrictedSQL()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/session/session.go:744 +0x92

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/statistics/handle.(*Handle).Update()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/statistics/handle/handle.go:162 +0x1eb

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/domain.(*Domain).loadStatsWorker()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/domain/domain.go:1099 +0x571

[2020-07-14T07:42:06.374Z] 

[2020-07-14T07:42:06.374Z] Previous read at 0x00c00066fd88 by goroutine 114:

[2020-07-14T07:42:06.374Z]   github.com/pingcap/tidb/executor.(*IndexLookUpExecutor).startIndexWorker.func1()

[2020-07-14T07:42:06.374Z]       /home/jenkins/agent/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/executor/distsql.go:514 +0x477

[2020-07-14T07:42:06.374Z] 

@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@ti-srebot ti-srebot added status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 14, 2020
@jackysp
Copy link
Member

jackysp commented Jul 14, 2020

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 18537
  • 18441

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

/run-unit-test

@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@lysu merge failed.

@lysu
Copy link
Contributor Author

lysu commented Jul 14, 2020

/run-unit-test

@lysu lysu merged commit ca41972 into pingcap:master Jul 14, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jul 14, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #18562

coocood pushed a commit that referenced this pull request Jul 15, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: lysu <sulifx@gmail.com>
@ghost ghost mentioned this pull request Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv priority/release-blocker This issue blocks a release. Please solve it ASAP. status/can-merge Indicates a PR has been approved by a committer. status/LGT3 The PR has already had 3 LGTM. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants