Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume Replacement causes high latency/jitter to the clients due to excessive IO and CPU usage (#6069) #6075

Merged

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #6069

What problem does this PR solve?

Volume Replacement causes high latency/jitter to the clients due to excessive IO and CPU usage because schedule.store-limit.xyz.remove-peer limit is being set to unlimited. Effectively the limit becomes num_available_tikv_pods * schedule.store-limit.xyz.add-peer. The problem is the TIKV store is still serving active leaders causing a high latency impact to clients.

This fix evicts the leaders from the TIKV pod/store before proceeding with volume replacement on the pod.

Ref: tikv/pd#4099

What is changed and how does it work?

Evict region leaders from the TIKV store before proceeding with volume replacement

Code changes

  • [X ] Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.

Fix high latency/jitter due to excessive IO and CPU usage during Volume Replacement.

…g deleted because schedule.store-limit.xyz.remove-peer limit is being set to unlimited. Effectively the limit becomes num_available_tikv_pods * schedule.store-limit.xyz.add-peer. The problem is the TIKV store is still serving active leaders causing a high latency impact to clients.

This fix evicts the leaders from the TIKV pod/store before proceeding with volume replacement on the pod.

Ref: tikv/pd#4099
Copy link
Contributor

ti-chi-bot bot commented Feb 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hanlins for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot requested review from KanShiori and shonge February 13, 2025 03:14
@sre-bot
Copy link
Contributor

sre-bot commented Feb 13, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@csuzhangxc csuzhangxc merged commit f92879f into pingcap:release-1.5 Feb 13, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants