-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc: set MaxConcurrentStreams to avoid sudden traffic spikes that lead to PD OOM #8977
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: okJiang <819421878@qq.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@okJiang: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8977 +/- ##
==========================================
- Coverage 76.31% 76.30% -0.02%
==========================================
Files 465 465
Lines 70547 70553 +6
==========================================
- Hits 53839 53834 -5
- Misses 13361 13370 +9
- Partials 3347 3349 +2
Flags with carried forward coverage won't be shown. Click here to find out more. |
@@ -243,6 +244,9 @@ const ( | |||
defaultGCTunerThreshold = 0.6 | |||
minGCTunerThreshold = 0 | |||
maxGCTunerThreshold = 0.9 | |||
// If concurrentStreams reaches 600k, the memory usage is about 40GB. To |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any tests to support the conclusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conclusion comes from a practical case where the cluster contains millions of regions. And its requests are ScanRegions
mainly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMIIW, it only limits the concurrency for one connection, not total concurrency on the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In simple terms, should we consider this sudden surge in traffic as an anomaly? If so, I think we can set this parameter to protect PD. Do you know in what normal circumstances PD would experience such high traffic? For example, 16GB PD and 160k requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// MaxConcurrentStreams specifies the maximum number of concurrent
// streams that each client can open at a time.
If users make massive requests using multiple clients, we have no way to limit it...
What problem does this PR solve?
Issue Number: Close #8882, ref #4480
What is changed and how does it work?
Added
MaxConcurrentStreams
to limit the request concurrency. This is a self-protection mechanism of PD. Once the number of concurrent requests reaches the limit, gRPC will wait for the ongoing requests to finish and allocate resources to the waiting requests. Therefore, in this situation, the request time may slow down, but it will provide better robustness.PS: This parameter cannot be modified during runtime, so we cannot support changing it by pd-ctl. If you think it should be configurable, please comment.
Check List
Tests
Release note