-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tikv: fix infinitely rebirthed secondary keys commit retry goroutine during tikv error #16061
Conversation
/run-all-tests |
Codecov Report
@@ Coverage Diff @@
## master #16061 +/- ##
===========================================
Coverage 80.6390% 80.6390%
===========================================
Files 506 506
Lines 138185 138185
===========================================
Hits 111431 111431
Misses 18178 18178
Partials 8576 8576 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the goroutine is created by the region error here?
https://github.com/pingcap/tidb/pull/16061/files#diff-499c236856cd9ce3300d3f5ccde41a23R1007
handleSingleBatch -->
regionErr != nil -->
c.commitMutations -->
doActionOnMutations -->
go doActionOnBatches -->
handleSingleBatch -->
The goroutine increase one, but the work is all the same?
sender := NewRegionRequestSender(c.store.regionCache, c.store.client) | ||
resp, err := sender.SendReq(bo, req, batch.region, readTimeoutShort) | ||
|
||
// If we fail to receive response for the request that commits primary key, it will be undetermined whether this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, extracting this function is not a good idea, because this logic belongs to commit, it's not a general one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason for extracting this function is mockable, make a piece of code(send request) can be replaced by a test logic a good reason to extract a method in the practice.
if not extract this function we need a new "if else" block to the parent method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember I had read some test-driven books about the question: "when to extract a method? or when to use interface?", they said "extract method or interface when you need to replace it" and when logic need replaced in a test, it has high probability be replaced in future requirement :D
yes, the question here is backoff maxSleep never can be reach, so handleSingleBatch-->...-> handleSingleBatch will never stop, maybe better we should avoid recreate goroutine in retry call, but let maxSleep works also solve question |
@tiancaiamao, @cfzjywxk, @jackysp, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tiancaiamao, @jackysp, @cfzjywxk, PTAL. |
1 similar comment
@tiancaiamao, @jackysp, @cfzjywxk, PTAL. |
LGTM |
we have a better solution at #16849, so close this |
What problem does this PR solve?
TestCommitRetryLimit
can not finish(it will wait secondary goroutines return after failpoint enabled, for normal user should not block but will see many goroutines), due to recursive retry call create new backoff every time and fork new goroutine every time, and backoff never meet max....Issue Number: refer #15995
there are also some questions in 2pc rate limit impl need relook.
Problem Summary:
What is changed and how it works?
What's Changed:
secondary commit backoff should inherit
totalSleep
from parent.but we can use
backoff.fork
method in here, because parent go routine exit should not can child retry goroutine here.How it Works:
deep copy a backoff, only one line change https://github.com/pingcap/tidb/pull/16061/files#diff-499c236856cd9ce3300d3f5ccde41a23L506
other code make it reproducible and testable
Related changes
Check List
Tests
Side effects
Release note
fix infinitely rebirthed secondary keys commit retry goroutine during tikv error
This change is