Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support distributed dead lock detection #263

Merged
merged 8 commits into from
Nov 20, 2019
Merged

support distributed dead lock detection #263

merged 8 commits into from
Nov 20, 2019

Conversation

cfzjywxk
Copy link
Contributor

@cfzjywxk cfzjywxk commented Nov 14, 2019

Issue #257

two related changes could be reviewed separately #269 #270

Support distributed deadlock detection

    1. Implement deadlock proto related RPC server interface
    1. Wrap former detector into new DeadLockDetector, add role to this new detector, and process requests based on current role(leader/follower)
    1. Add detectConn and detectClient to do stream send and recv [this wrapping removed]
    1. Add code path to process first region role change callback named OnRoleChange
    1. Add recycling utility for detector wait map
    1. Move the detector source code and test case from the dependent tidb package to local tikv.detector.go , and add some extra utilities.
    1. Refactor lock waiter manager, remove the queue mutex

The storage engine still has some problems #267 #268, the distributed tests are not done yet.

@cfzjywxk
Copy link
Contributor Author

@coocood PTAL distributed tests not done yet, thx

tikv/region.go Outdated Show resolved Hide resolved
tikv/region.go Outdated Show resolved Hide resolved
tikv/region.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor Author

@coocood @bobotu PTAL

tikv/server.go Outdated Show resolved Hide resolved
tikv/mvcc.go Outdated Show resolved Hide resolved
tikv/region.go Outdated Show resolved Hide resolved
tikv/mvcc.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
@cfzjywxk cfzjywxk changed the title support distributed dead lock detection (WIP) support distributed dead lock detection Nov 17, 2019
@cfzjywxk cfzjywxk requested a review from coocood November 17, 2019 14:25
tikv/server.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated
log.Errorf("refresh leader addr failed, err=%v", refreshErr)
}
time.Sleep(3 * time.Second)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed req will be discarded when retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the request will be timeout and the pessimistic lock error will be reported instead of possible deadlock error, tidb may retry the pessimistic lock request

tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor Author

@bobotu @coocood PTAL

  1. Detector code and test case ported from tidb dependent package, the slice is changed to list and some extra utils are added like entry size counter.
  2. detectionConn and detectClient removed.
  3. Entry recycling logic added

tikv/detector.go Outdated Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor Author

conflicts resolved.

  1. Edge expire validation added in doDetect
  2. OnRoleChange implementation based on newly region event handler goroutine

@coocood @bobotu PTAL

tikv/detector.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated Show resolved Hide resolved
tikv/deadlock.go Outdated
}

// Detect will process the detection request on local detector
func (dt *DeadlockDetector) Detect(txnTs uint64, waitForTxnTs uint64, keyHash uint64) error {
Copy link
Collaborator

@coocood coocood Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the local detector optimization would make the code much simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used mainly for Server.Detect RPC service handler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can separate to DetectorClient and DetectorServer, DetectorClient doesn't need to contain the Detector type.

Copy link
Contributor Author

@cfzjywxk cfzjywxk Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeadlockDetector ->(split into) DetectorClient and DetectorServer , and
DetectorServer wraps the detector type and do real local detection(used by rpc handler),
DetectorClient warps connection related things and is responcible for send and recv
do I understand right? Seems same code lines 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger, refactored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coocood only detection client interface kept, others removed

@coocood
Copy link
Collaborator

coocood commented Nov 20, 2019

LGTM

@coocood coocood merged commit e687251 into ngaut:master Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants