Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: seperater metrics as internal and external for slo-controller and koordlet #1807

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

zwzhang0107
Copy link
Contributor

@zwzhang0107 zwzhang0107 commented Jan 2, 2024

Ⅰ. Describe what this PR does

seperate koordlet/slo-controller metrics as internal and external
ExternalHTTPPath: for end users, such as batch resource util (also compatible with old URL /metircs for koordlet)
InternalHTTPPath: for cluster admin, such as performance of components.
DefaultHTTPPath: merge all internal, external and controller runtime metrics.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Jan 2, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (da6a05c) 67.23% compared to head (f021958) 67.26%.
Report is 2 commits behind head on main.

Files Patch % Lines
pkg/util/metrics/merged_gather.go 72.72% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1807      +/-   ##
==========================================
+ Coverage   67.23%   67.26%   +0.03%     
==========================================
  Files         407      410       +3     
  Lines       45644    45662      +18     
==========================================
+ Hits        30687    30715      +28     
+ Misses      12741    12731      -10     
  Partials     2216     2216              
Flag Coverage Δ
unittests 67.26% <90.32%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zwzhang0107 zwzhang0107 force-pushed the add-metric branch 2 times, most recently from a403674 to 9b189d1 Compare January 2, 2024 09:48
@zwzhang0107 zwzhang0107 changed the title koordlet: seperater metrics as internal and external koordlet/slo-controller: seperater metrics as internal and external Jan 2, 2024
Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the deploy templates under the /config since the metrics paths are going to change.

cmd/koord-manager/main.go Show resolved Hide resolved
@ZiMengSheng
Copy link
Contributor

/milestone v1.5

@zwzhang0107
Copy link
Contributor Author

Need to update the deploy templates under the /config since the metrics paths are going to change.

the url path has not defined yet both in config and charts

@eahydra
Copy link
Member

eahydra commented Jan 6, 2024

Is it really necessary to split it into two metrics urls? Users also need to configure two prometheus configurations for slo-controller, and this indicator must be distinguished when adding indicators in future code. Moreover, only two components, koord-manager and koordlet, are involved here. Do scheduler/descheduler also need to be followed? I don't think such a distinction should be made, and it is not a common or recommended practice.

@zwzhang0107
Copy link
Contributor Author

Is it really necessary to split it into two metrics urls? Users also need to configure two prometheus configurations for slo-controller, and this indicator must be distinguished when adding indicators in future code. Moreover, only two components, koord-manager and koordlet, are involved here. Do scheduler/descheduler also need to be followed? I don't think such a distinction should be made, and it is not a common or recommended practice.

@eahydra Yes, it is necessary. any extended metric will be charged(ref https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/dynamic-resource-overcommitment?spm=a2c4g.11186623.0.0.74d118ab3vDxaM#ef961016a2uds), and for end users they only need care about external metrics, this will cost a lot if we do not separate internal and external after we add new metrics about performance.

scheduler/descheduler should also split if they have the same senario in future.

@eahydra
Copy link
Member

eahydra commented Jan 8, 2024

Is it really necessary to split it into two metrics urls? Users also need to configure two prometheus configurations for slo-controller, and this indicator must be distinguished when adding indicators in future code. Moreover, only two components, koord-manager and koordlet, are involved here. Do scheduler/descheduler also need to be followed? I don't think such a distinction should be made, and it is not a common or recommended practice.

@eahydra Yes, it is necessary. any extended metric will be charged(ref https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/dynamic-resource-overcommitment?spm=a2c4g.11186623.0.0.74d118ab3vDxaM#ef961016a2uds), and for end users they only need care about external metrics, this will cost a lot if we do not separate internal and external after we add new metrics about performance.

scheduler/descheduler should also split if they have the same senario in future.

This seems to be solved through the reable_config mechanism of prometheus. Only need to configure the metrics regex and add action=keep.

@zwzhang0107 zwzhang0107 force-pushed the add-metric branch 2 times, most recently from f75a5c9 to fca8324 Compare January 8, 2024 06:40
@zwzhang0107
Copy link
Contributor Author

reable_config

this is not supported for

Is it really necessary to split it into two metrics urls? Users also need to configure two prometheus configurations for slo-controller, and this indicator must be distinguished when adding indicators in future code. Moreover, only two components, koord-manager and koordlet, are involved here. Do scheduler/descheduler also need to be followed? I don't think such a distinction should be made, and it is not a common or recommended practice.

@eahydra Yes, it is necessary. any extended metric will be charged(ref https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/dynamic-resource-overcommitment?spm=a2c4g.11186623.0.0.74d118ab3vDxaM#ef961016a2uds), and for end users they only need care about external metrics, this will cost a lot if we do not separate internal and external after we add new metrics about performance.
scheduler/descheduler should also split if they have the same senario in future.

This seems to be solved through the reable_config mechanism of prometheus. Only need to configure the metrics regex and add action=keep.

prometheus.yaml is not configurable for end user by default in product like ACK

@saintube
Copy link
Member

Ⅰ. Describe what this PR does

seperate koordlet/slo-controller metrics as internal and external /external-metrics: for end users, such as batch resource util /metrics: for cluster admin, such as performance of components.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

It seems the external metrics are using the path /metrics.

@zwzhang0107
Copy link
Contributor Author

Ⅰ. Describe what this PR does

seperate koordlet/slo-controller metrics as internal and external /external-metrics: for end users, such as batch resource util /metrics: for cluster admin, such as performance of components.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

It seems the external metrics are using the path /metrics.

updated

@zwzhang0107 zwzhang0107 force-pushed the add-metric branch 3 times, most recently from b90ad33 to 3ad578b Compare January 24, 2024 03:46
@zwzhang0107 zwzhang0107 changed the title koordlet/slo-controller: seperater metrics as internal and external metrics: seperater metrics as internal and external Jan 24, 2024
@zwzhang0107 zwzhang0107 changed the title metrics: seperater metrics as internal and external metrics: seperater metrics as internal and external for slo-controller and koordlet Jan 24, 2024
Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@koordinator-bot koordinator-bot bot added the lgtm label Jan 24, 2024
Copy link
Member

@FillZpp FillZpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@zwzhang0107
Copy link
Contributor Author

/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FillZpp, saintube, zwzhang0107

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 07e51fa into koordinator-sh:main Jan 24, 2024
20 checks passed
saintube pushed a commit to saintube/koordinator that referenced this pull request Feb 22, 2024
…r and koordlet (koordinator-sh#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
saintube pushed a commit that referenced this pull request Feb 26, 2024
…r and koordlet (#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
saintube pushed a commit to saintube/koordinator that referenced this pull request Feb 26, 2024
…r and koordlet (koordinator-sh#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
saintube pushed a commit to saintube/koordinator that referenced this pull request Feb 26, 2024
…r and koordlet (koordinator-sh#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
saintube pushed a commit that referenced this pull request Feb 26, 2024
…r and koordlet (#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
ls-2018 pushed a commit to ls-2018/koordinator that referenced this pull request Mar 25, 2024
…r and koordlet (koordinator-sh#1807)

Signed-off-by: 佑祎 <zzw261520@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants