-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource_control: support calibrate resource #42165
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Skipping CI for Draft Pull Request. |
@nolouch @BornChanger PTAL |
/test all |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
readBytes: units.MiB / 2, // 0.5MiB | ||
writeBytes: units.MiB, // 1MiB | ||
readReqCount: 300, | ||
writeReqCount: 1750, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it means 1 core can provide 1750 request in here? maybe add more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is based on benchmark result. I added comment on the baseResourceCost struct
executor/calibrate_resource.go
Outdated
return err | ||
} | ||
|
||
workload := "tpcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a const or defined type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
executor/calibrate_resource.go
Outdated
for i, f := range fields { | ||
switch f.ColumnAsName.L { | ||
case "instance": | ||
//instanceIdx = i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clean it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result could be inaccurate.
It has many hypothesis, like the bottlenect is TiDB | TiKV CPU, like the workload assumption, like the performance on different hardware...
Yes. This is the restriction of the current implementation. We plan to expand this command to support estimating the RU capacity based on user's workload dynamically, this should be more useful for the user. |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: a6d48fb
|
What problem does this PR solve?
Issue Number: ref #38825
Problem Summary:
What is changed and how it works?
This PR add a new statement
calibrate resource
to estimate the total Request-Units(RU) of the current cluster.Because the total ru usage is related to workload resource consuming, so the maximum RU can be different with different workload. Thus, the maximum RU estimated by this PR is based on a given workload -- TPC-C, and we may support other workload(e.g. sysbench) in the future.
In general, the bottle of a cluster can be one of TiDB CPU, TiKV CPU, TiKV IO Bandwidth. Currently, we can get the exact IO bandwidth and for most workload, io is unlikely to be the bottleneck. So here, we only consider TiDB CPU or TiKV CPU as bottleneck.
For a specified workload, the resource consuming is linear co-related with each other. So this PR use pre-benchmarked data of each resource dimension to calculate the ru cost per 1 tikv cpu. So if tikv cpu is the bottleneck, then Max RU = max_ru_per_1_kv_cpu * Total_TiKV_CPU; if tidb cpu is the bottleneck, then we just decrease the total kv cpu with a certain portaion.
The PR calculate the RU cost of different resource dimension separated so we can support calculate total ru with custom ru config and the expected RU capacity can reflect the RU config change.
The current SQL UI is as follows(We may add more information in the future version):
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.