Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule, checker: add ruleChecker #2042

Merged
merged 8 commits into from
Dec 24, 2019
Merged

Conversation

disksing
Copy link
Contributor

What problem does this PR solve?

Introduce RuleChecker to fix region according to placement rules.

What is changed and how it works?

  • add ruleChecker
  • add tests
  • adapt operator.Builder

Check List

Tests

  • Unit test

Signed-off-by: disksing <i@disksing.com>
Signed-off-by: disksing <i@disksing.com>
Signed-off-by: disksing <i@disksing.com>
Signed-off-by: disksing <i@disksing.com>
@disksing disksing added the component/schedule Scheduling logic. label Dec 20, 2019
@disksing disksing added this to the v4.0.0-beta milestone Dec 20, 2019
@codecov-io
Copy link

codecov-io commented Dec 20, 2019

Codecov Report

Merging #2042 into master will decrease coverage by 0.02%.
The diff coverage is 76.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2042      +/-   ##
==========================================
- Coverage   76.87%   76.84%   -0.03%     
==========================================
  Files         184      185       +1     
  Lines       18653    18829     +176     
==========================================
+ Hits        14339    14470     +131     
- Misses       3227     3253      +26     
- Partials     1087     1106      +19
Impacted Files Coverage Δ
server/schedule/operator/operator.go 95.91% <ø> (ø) ⬆️
server/schedule/checker/rule_checker.go 75.62% <75.62%> (ø)
server/schedule/operator/builder.go 95.12% <82.35%> (-0.67%) ⬇️
pkg/metricutil/metricutil.go 90.62% <0%> (-9.38%) ⬇️
server/schedulers/base_scheduler.go 60.86% <0%> (-8.7%) ⬇️
server/schedulers/random_merge.go 60.71% <0%> (-3.58%) ⬇️
server/member/leader.go 75.51% <0%> (-2.56%) ⬇️
server/region_syncer/client.go 81.92% <0%> (-2.41%) ⬇️
pkg/mock/mockhbstream/mockhbstream.go 89.23% <0%> (-1.54%) ⬇️
server/schedulers/adjacent_region.go 73.29% <0%> (-1.05%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ae11e2...6d57b94. Read the comment docs.

server/schedule/checker/rule_checker.go Outdated Show resolved Hide resolved
server/schedule/checker/rule_checker_test.go Outdated Show resolved Hide resolved
server/schedule/checker/rule_checker.go Outdated Show resolved Hide resolved
log.Warn("lost the store, maybe you are recovering the PD cluster", zap.Uint64("store-id", peer.StoreId))
return false
}
return !store.IsUp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use IsOffline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we treat tomestone as offline too. (it happens when force bury a store before moing out all regions). replicaChecker's behaviour is like this too: https://github.com/pingcap/pd/blob/4d65bbefbc6db6e92fee33caa97415274512757a/server/schedule/checker/replica_checker.go#L216-L218

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this function named isOfflinePeer which may be a little bit confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em. do you have better suggestion? (replicaChecker uses checkOfflinePeer as function name too...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IsOfflineOrTomestone looks too verbose 😢

server/schedule/checker/rule_checker.go Show resolved Hide resolved
disksing and others added 2 commits December 20, 2019 17:47
Signed-off-by: disksing <i@disksing.com>

Co-Authored-By: Ryan Leung <rleungx@gmail.com>
Signed-off-by: disksing <i@disksing.com>
Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM.

server/schedule/checker/rule_checker.go Show resolved Hide resolved
server/schedule/checker/rule_checker.go Outdated Show resolved Hide resolved
server/schedule/checker/rule_checker.go Show resolved Hide resolved

// SelectStoreToReplacePeerByRule selects a store to replace a region peer in order to fit the placement rule.
func SelectStoreToReplacePeerByRule(scope string, cluster opt.Cluster, region *core.RegionInfo, fit *placement.RegionFit, rf *placement.RuleFit, peer *metapb.Peer, filters ...filter.Filter) *core.StoreInfo {
rf2 := removePeerFromRuleFit(rf, peer)
Copy link
Contributor

@nolouch nolouch Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What purpose for this, don't want to remove looseMatchPeer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to replicaChecker.SelectBestReplacementStore, replace is done by
copy region with the old peer removed + add a new peer to the copied region

The difference is in ruleChecker, replace happens inside a rule (remove a peer, add another peer fits the same rule), so the replace process besomes
copy RuleFit with the old peer removed + add a new peer to the copied RuleFit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got

Signed-off-by: disksing <i@disksing.com>
@disksing
Copy link
Contributor Author

PTAL @rleungx

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rleungx rleungx added the status/can-merge Indicates a PR has been approved by a committer. label Dec 24, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 24, 2019

/run-all-tests

@sre-bot sre-bot merged commit 71d95d6 into tikv:master Dec 24, 2019
@disksing disksing deleted the placement13 branch December 24, 2019 12:05
@disksing disksing mentioned this pull request Dec 25, 2019
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/schedule Scheduling logic. status/can-merge Indicates a PR has been approved by a committer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants