Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOSTEDCP-1044: Add nodepools telemetry metrics for HyperShift #2265

Merged

Conversation

muraee
Copy link
Contributor

@muraee muraee commented Feb 16, 2024

  • hypershift:nodepools:size
  • hypershift:nodepools:available_replicas

requires: openshift/hypershift#3593

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 16, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 16, 2024

@muraee: This pull request references HOSTEDCP-1044 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target only the "4.16.0" version, but multiple target versions were set.

In response to this:

  • hypershift:nodepools:size
  • hypershift:nodepools:available_replicas

requires: openshift/hypershift#3593

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@muraee
Copy link
Contributor Author

muraee commented Apr 30, 2024

/retest-required

@muraee
Copy link
Contributor Author

muraee commented Apr 30, 2024

cc @simonpasquier

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From openshift/hypershift#3593 (comment)

changed to aggregate per HostedCluster, the maximum cardinality would between 80-100

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series. It's still way above the "automatically approved" limit (https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#request-approval).

cc @jan--f @moadz

@@ -921,6 +921,16 @@ data:
# platform:hypershift_nodepools:max is the total number of nodepools managed by the hypershift operator by cluster platform
- '{__name__="platform:hypershift_nodepools:max"}'
#
# owners: (@openshift/team-hypershift-maintainers)
#
# cluster_name:hypershift_nodepools_size:sum is the total number of desired nodepool replicas managed by the hypershift operator per HostedCluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC HostedCluster = (cluster_name, exported_namespace) labels. Could these values contain identifying information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so

@simonpasquier
Copy link
Contributor

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

@muraee
Copy link
Contributor Author

muraee commented May 3, 2024

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series

per HostedCluster, multiple HostedClusters can be created in the same management cluster.

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

No need cc @zanetworker

@simonpasquier
Copy link
Contributor

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series

per HostedCluster, multiple HostedClusters can be created in the same management cluster.

I don't get it. Say that a management cluster runs 100 HostedClusters then the cardinality of cluster_name:hypershift_nodepools_size:sum will be 100 (and the same for luster_name:hypershift_nodepools_available_replicas:sum hence the 200 series).Do you confirm?

@simonpasquier
Copy link
Contributor

cc @zanetworker, see #2265 (comment) (the previous mention failed).

@muraee
Copy link
Contributor Author

muraee commented May 3, 2024

hence the 200 series).Do you confirm?

@simonpasquier exactly right. sorry I was thinking of something else.

@zanetworker
Copy link

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

No need @simonpasquier

@muraee
Copy link
Contributor Author

muraee commented May 8, 2024

/retest-required

- cluster_name:hypershift_nodepools_size:sum
- cluster_name:hypershift_nodepools_available_replicas:sum
@muraee
Copy link
Contributor Author

muraee commented Jun 11, 2024

@simonpasquier are we ok to move forward and merge this?

@jan--f
Copy link
Contributor

jan--f commented Jun 19, 2024

Not sure if this was discussed elsewhere already, but for a telemetry addition of this size we require management buy in. Please get @eparis approval for this.

@zanetworker
Copy link

@eparis any comments/objections on this. This is needed to be able track cluster sizes in HyperShift.

@zanetworker
Copy link

@eparis Ping :)

@eparis
Copy link
Member

eparis commented Aug 13, 2024

Daniel reached out to me directly. I miss 99.9999%of github notifications :-( I approve.

@jan--f
Copy link
Contributor

jan--f commented Aug 13, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2024
@jan--f
Copy link
Contributor

jan--f commented Aug 13, 2024

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2024
Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD c9a1d8d and 2 for PR HEAD 3da125d in total

Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 6525b00 into openshift:master Aug 13, 2024
18 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-monitoring-operator
This PR has been included in build cluster-monitoring-operator-container-v4.18.0-202408131314.p0.g6525b00.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants