server: a better way to handle killed connection (#32809) #33076
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cherry-pick #32809 to release-5.2
You can switch your code base to this Pull Request by using git-extras:
# In tidb repo: git pr https://github.com/pingcap/tidb/pull/33076
After apply modifications, you can push your change to this PR via:
What problem does this PR solve?
Issue Number: close #24031, this PR also reverts #29212.
Problem Summary:
What is changed and how it works?
The root cause of #24031 is that when a connection is idle, the goroutine is blocked at:
https://github.com/pingcap/tidb/blob/4a0d387e1ff1b508bbb60d484d97e4ac2a5ef2c7/server/conn.go#L1068
And the stack:
Because of that, the goroutine is not able to deal with the
KILLED
flag, release the resource it is holding and stop itself immediately.In order to solve that, we need to make
conn.Read()
interruptable but there is no straightforward way in Go to do that. Some references:go/issues/20280
Read()
For the approach introduced in 2 and 3, they are generally the same as this PR: setting
SetReadDeadline
in another goroutine. I cannot find any material describing if doing so is thread-safe, so it should be implementation-dependent and might not be safe, but it might not be a real problem considering we're about the kill the connection and the read buffer/status will be abandoned.Alternatives
SHOW PROCESSLIST
(and infoschema) is modified to show the State asKilled
, as mentioned by @morgo in Killing connections needs cooperation from the to-be-killed connection #24031 (comment), the result ofSHOW PROCESSLIST
can be clear to the user but it doesn't solve the delayed 'release lock' issue(seeCase 2
in 'Manual test' part).waitTimeout
, the code is instead modified to have a hard coded2s
timeout, but loops for up towaitTimeout
retrying a read..., also mentioned by @morgo in Killing connections needs cooperation from the to-be-killed connection #24031 (comment), the potential thread-safe concern can be avoided but we still have at most2s
delay for killing an idle connection and the code would be complicated.SetReadDeadline()
,bufReadConn.Close()
can be another solution. It is basically the same withSetReadDeadline()
IMHO.Check List
Tests
Case1:
Case2:
Side effects
Release note