Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinator: use a healthy region count to start coordinator #7044

Merged
merged 25 commits into from
Oct 13, 2023

Conversation

HuSharp
Copy link
Member

@HuSharp HuSharp commented Sep 5, 2023

What problem does this PR solve?

Issue Number: Close #6988 ,Close #7016

What is changed and how does it work?

  • different source of region, regard FromSync and FromStorage as the stale region
  • add a stale region count in regionTree to start coordinator first.
  • atomic add/sub stale regions number by region updated

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

Release note

use a healthy region count to avoid starting coordinator timeout.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 5, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • bufferflies
  • nolouch

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 5, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed labels Sep 5, 2023
@ti-chi-bot ti-chi-bot bot requested review from JmPotato and rleungx September 5, 2023 03:39
@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 5, 2023
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 6, 2023
@HuSharp HuSharp changed the title coordinator: use a region count to start coordinator coordinator: use a stale region count to start coordinator Sep 6, 2023
@HuSharp HuSharp force-pushed the fix_coordinator branch 2 times, most recently from d34b261 to 4b862b9 Compare September 6, 2023 03:42
Signed-off-by: husharp <jinhao.hu@pingcap.com>
@HuSharp HuSharp marked this pull request as ready for review September 6, 2023 04:49
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2023

const (
// InDisk means region is stale.
InDisk RegionSource = iota
Copy link
Member

@rleungx rleungx Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about FromLoad or FromStorage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to FromStorage

Signed-off-by: husharp <jinhao.hu@pingcap.com>
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #7044 (a9d6e3e) into master (0adb86f) will increase coverage by 0.07%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master    #7044      +/-   ##
==========================================
+ Coverage   74.62%   74.69%   +0.07%     
==========================================
  Files         442      442              
  Lines       47660    47681      +21     
==========================================
+ Hits        35565    35615      +50     
+ Misses       8972     8947      -25     
+ Partials     3123     3119       -4     
Flag Coverage Δ
unittests 74.69% <94.44%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

pkg/core/region.go Outdated Show resolved Hide resolved
pkg/core/region.go Outdated Show resolved Hide resolved
pkg/core/region.go Outdated Show resolved Hide resolved
Signed-off-by: husharp <jinhao.hu@pingcap.com>
pkg/core/region_tree.go Outdated Show resolved Hide resolved
@@ -342,3 +346,20 @@ func (t *regionTree) TotalWriteRate() (bytesRate, keysRate float64) {
}
return t.totalWriteBytesRate, t.totalWriteKeysRate
}

func (t *regionTree) AtomicAddStaleRegionCnt() {
if t.length() == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check the length here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we do not need to check it after moving staleRegionCnt into regionsInfo

pkg/schedule/prepare_checker.go Outdated Show resolved Hide resolved
pkg/syncer/client.go Outdated Show resolved Hide resolved
log.Info("stale meta region number is satisfied, skip prepare checker", zap.Int64("stale-region", c.GetStaleRegionCnt()), zap.Int("total-region", c.GetTotalRegionCount()))
checker.prepared = true
return true
}
// The number of active regions should be more than total region of all stores * collectFactor
if float64(c.GetTotalRegionCount())*collectFactor > float64(checker.sum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need it?

Copy link
Member Author

@HuSharp HuSharp Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is better to remove them in another pr? This pr just focuses on accelerating coordinator.
What do u think :)

Comment on lines 865 to 866
// FromStorage means this region's meta info might be stale.
r.AtomicAddStaleRegionCnt()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to make sure that only reload from storage will use CheckAndPutRegion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by return regions num

)

// IsSourceStale means this region's meta info might be stale.
func (r *RegionInfo) IsSourceStale() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not using stale/fresh? Just use LoadedFromStorage and !LoadedFromStorage. Or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: husharp <jinhao.hu@pingcap.com>
@HuSharp HuSharp removed needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels Oct 11, 2023
Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 13, 2023
@nolouch
Copy link
Contributor

nolouch commented Oct 13, 2023

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 13, 2023

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 13, 2023

This pull request has been accepted and is ready to merge.

Commit hash: 6281684

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 13, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 13, 2023

@HuSharp: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 2ba7ed4 into tikv:master Oct 13, 2023
@HuSharp HuSharp deleted the fix_coordinator branch October 16, 2023 08:19
disksing added a commit to oh-my-tidb/pd that referenced this pull request Nov 8, 2023
close tikv#6988, close tikv#7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@HuSharp HuSharp added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Nov 14, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #7363.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Nov 14, 2023
close tikv#6988, close tikv#7016

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@HuSharp HuSharp added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Nov 14, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #7364.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Nov 14, 2023
close tikv#6988, close tikv#7016

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot added a commit that referenced this pull request Nov 24, 2023
…7363)

close #6988, close #7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit that referenced this pull request Dec 1, 2023
…7364)

close #6988, close #7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 1, 2023
close tikv#6988, close tikv#7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
6 participants