Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ClusterResourceQuota prometheus alert for overprovisioning #1311

Merged

Conversation

anmolsachan
Copy link
Contributor

@anmolsachan anmolsachan commented Aug 24, 2021

Depends on #1282 . This PR is required to alert users about the thresholds being reached set via the functionality in #1282 .

Signed-off-by: Anmol Sachan anmol13694@gmail.com

@anmolsachan anmolsachan force-pushed the overprovision_alert branch 3 times, most recently from fc5aa97 to 93c4f30 Compare August 24, 2021 10:33
@anmolsachan
Copy link
Contributor Author

@umangachapagain @synarete Please review.

@anmolsachan anmolsachan changed the title add ClusterResourceQuota alert add ClusterResourceQuota prometheus alert for overprovisioning Aug 24, 2021
@umangachapagain umangachapagain added this to the OCS 4.9 milestone Aug 25, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2021
@jarrpa jarrpa added mvp Required for the next minimum viable product. priority/1-high labels Aug 26, 2021
Copy link
Member

@jarrpa jarrpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane at first glance, but I'm not an expert in any of this. Will let others LGTM.

@@ -5,9 +5,11 @@

// Duration to raise various Alerts
clusterObjectStoreStateAlertTime: '15s',
clusterResourceQuotaAlertTime: '0s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0s feels too aggressive. Any reason why it can't be 5s or 10s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the PVC requests are absolute like 50GB or 100GB, and won't grow gradually like usage.

Also taking an example :

If the set quota is 1TB, and already provisioned is 750GB, then if at this moment, another PVC of 100 GB is requested, the total will go to 850G. Then users should immediately be informed about it because all PVC requests above the remaining 150GB will be failed immediately by the ClusterResourceQuota.

Copy link
Contributor

@umangachapagain umangachapagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge but holding off until dependent PR is merged.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2021
@umangachapagain
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2021
@jarrpa
Copy link
Member

jarrpa commented Aug 27, 2021

Dependent PR merged.

/hold cancel

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 27, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2021
This commit adds prometheus alert to notify the users when the PVC
request for a particular storageclass goes beyond 80% of the limit
set by the user through the ClusterResourceQuota resource.

Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
@agarwal-mudit
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 30, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: agarwal-mudit, umangachapagain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [agarwal-mudit,umangachapagain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 8c9b10a into red-hat-storage:master Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. mvp Required for the next minimum viable product. priority/1-high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants