Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Prometheus alerts and label existing alerts #2998

Merged
merged 9 commits into from
Dec 19, 2023

Conversation

arnongilboa
Copy link
Collaborator

@arnongilboa arnongilboa commented Nov 22, 2023

What this PR does / why we need it:

  • CDINoDefaultStorageClass - not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert.
  • CDIDefaultStorageClassDegraded - when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration).
  • CDIStorageProfilesIncomplete - add storageClass and provisioner labels.
  • CDIDataImportCronOutdated - add dataImportCron namespace and name labels.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Add Prometheus alerts CDINoDefaultStorageClass and CDIDefaultStorageClassDegraded; add informative labels to existing alerts CDIStorageProfilesIncomplete and CDIDataImportCronOutdated

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 22, 2023
doc/metrics.md Outdated Show resolved Hide resolved
@arnongilboa arnongilboa changed the title Add new Prometheus alerts and label existing alerts [WIP] Add new Prometheus alerts and label existing alerts Nov 23, 2023
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 23, 2023
Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just summing up the offline discussion

metrics.Registry.MustRegister(controller.DefaultVirtStorageClassesGauge)
metrics.Registry.MustRegister(controller.DefaultStorageClassWithSmartCloneAndRWXGauge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following up on offline discussion:
Maybe make DefaultStorageClassWithSmartCloneAndRWXGauge just about virt storage classes? Even then I fear we can't bug people who have some basic storage solution. But as long as we see this in telemetry we can adjust ourselves

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree.

@@ -221,6 +221,21 @@ func getAlertRules(runbookURLTemplate string) []promv1.Rule {
componentAlertLabelKey: componentAlertLabelValue,
},
),
generateAlertRule(
"CDINoDefaultStorageClass",
"kubevirt_cdi_default_storageclasses == 0 and kubevirt_cdi_default_virt_storageclasses == 0 and kubevirt_cdi_datavolumes_pending_for_default_storageclass > 0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also just be about virt storage classes? DataImportCrons without explicit SC should all work if a virt storage class is defined

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get you. User can define default and/or virt default SC, so why require the virt one specifically?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a similar suggestion to #2998 (comment). Reading it again though I think it's fine since it'll only alert if all of them are 0, so virt sc existence will bail us out of noise.

DefaultVirtStorageClassesGauge.Set(float64(defaultVirtStorageClassCount))

hasSmartCloneAndRWX := 0
if defaultStorageClassCount > 0 || defaultVirtStorageClassCount > 0 {
sc := cc.GetFallbackStorageClass(storageClassList, cdiv1.DataVolumeKubeVirt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're already (unintentionally?) always looking up the virt default storage class
(always pass cdiv1.DataVolumeKubeVirt content type)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fallback func prefers the virt default storage class, but returns the default SC if virt default is not set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying that you already unintentionally implemented #2998 (comment) since this cc.GetFallbackStorageClass(storageClassList, cdiv1.DataVolumeKubeVirt) will only bring you virt storage classes

@avlitman
Copy link
Contributor

avlitman commented Nov 26, 2023

Hi @arnongilboa and @akalenyu I'm also working on the same files to refactor the monitoring code to use external pkg we created, and I think it will be better if I work on top of this pr.
The question is, do you think this pr will be ready soon, or is it still WIP?

I will upload an initial pr not including this changes by the end of today, still facing compile issues...
Thanks!

DefaultVirtClasses: {
Name: "kubevirt_cdi_default_virt_storageclasses",
Help: "Number of default virt storage classes currently configured",
Type: "Gauge",
},
DataVolumesPendingForDefaultStorageClass: {
Name: "kubevirt_cdi_datavolumes_pending_for_default_storageclass",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is other metrics in the form of kubevirt_cdi_***_storage***
maybe it's better to name this one kubevirt_cdi_default_storageclass_pending_datavolumes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be renamed. Thanks Aviv!

@arnongilboa
Copy link
Collaborator Author

Hi @arnongilboa and @akalenyu I'm also working on the same files to refactor the monitoring code to use external pkg we created, and I think it will be better if I work on top of this pr. The question is, do you think this pr will be ready soon, or is it still WIP?

I will upload an initial pr not including this changes by the end of today, still facing compile issues... Thanks!

It will be ready during this week after another review by @sradco, so you better wait with your PR until this one is in.

Type: "Gauge",
},
DefaultStorageClassWithSmartCloneAndRWX: {
Name: "kubevirt_cdi_default_storageclass_with_smart_clone_and_rwx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match kubevirt_cdi_dataimportcron_outdated
Maybe kubevirt_cdi_default_storageclass_smart_clone_and_rwx

Also can you explain the "smart" part, e.g. kubevirt_cdi_clone_pods_high_restart are also smart clone pods?

Copy link
Collaborator Author

@arnongilboa arnongilboa Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this metric, using labels on the kubevirt_cdi_storageprofile_status metric instead.

if f.SnapshotSCName != defaultStorageClass.Name && f.CsiCloneSCName != defaultStorageClass.Name {
Skip("Default storage class does not support snapshot or CSI clone")
}
if !f.IsSnapshotStorageClassAvailable() && !f.IsCSIVolumeCloneStorageClassAvailable() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test will always get skipped since the destructive lane only has HPP (no snapshot support)
Maybe since we don't actually need snapshot functionality, just the vsc object, we can set it up ad hoc just for this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll try your nice idea :)

By("Restore storage profile")
updateDefaultStorageClassProfileClaimPropertySets(nil)

By("Ensure metric values after restoring storage profile")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to suggest to ensure that the metric is restored after the cluster state is "fixed"
but I see you already do that in all of the tests. Cool
Testing that alerts go away when things are good is really critical from my experience

@arnongilboa
Copy link
Collaborator Author

/retest

},
map[string]string{
severityAlertLabelKey: "warning",
healthImpactAlertLabelKey: "warning",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnongilboa Please make sure this is the correct impact level on the operator health.

},
map[string]string{
severityAlertLabelKey: "warning",
healthImpactAlertLabelKey: "warning",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnongilboa Please make sure this is the correct impact level on the operator health.

httpClient *http.Client

// DataVolumePendingGauge is the metric we use to count the DataVolumes pending for default storage class to be configured
DataVolumePendingGauge = prometheus.NewGauge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this variable and all metrics related functions would be in a dedicated package

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each metric is maintained in the relevant packge where the logic is (DV, DIC, StorageProfile etc.). I don't think we should put them all in one package. However, the alerts are based on multiple metrics and therefore defined in a single package.

@@ -388,6 +402,25 @@ func getSourceRefOp(log logr.Logger, dv *cdiv1.DataVolume, client client.Client)
}
}

func updatePendingDataVolumesGauge(c client.Client, dv *cdiv1.DataVolume) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you changed the client name from c to client in another PR so maybe you should do it here too :)

}

dvList := &cdiv1.DataVolumeList{}
if c.List(context.TODO(), dvList, client.MatchingFields{dvPhaseField: string(cdiv1.Pending)}) != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to log (in a low verbosity) that we failed getting the list and that why the mertic will not get updated?

dvCount++
}
}
DataVolumePendingGauge.Set(float64(dvCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not so familiar with how the metric is being calculated but is there any considerations to namespaces at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are interested in the total default-SC pending DVs in all namespaces (whether it is zero or not).

}

return reconcile.Result{}, r.computeMetrics()
return r.reconcileStorageProfile(storageClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way we will compute the metric only if no error occured are we sure thats what we want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We update the metric of a specific SC only if no error occured, which seems to be a consistent behavior. In case of SC deletion, we delete the metric.

// DefaultVirtStorageClassesGauge is the metric we use to alert about multiple default virt storage classes
DefaultVirtStorageClassesGauge = prometheus.NewGauge(
// StorageProfileStatusGaugeVec is the metric we use to alert about storage profile status
StorageProfileStatusGaugeVec = prometheus.NewGaugeVec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned I think it will be much cleaner to have all Prometheus functionality and variables in dedicated pkg/file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree, as already answered. However we can do it in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShellyKa13, the metrics and alerts would be moved to a dedicated package as part of this PR #3009

"CDINoDefaultStorageClass",
`sum(kubevirt_cdi_storageprofile_status{default="true"} or on() vector(0)) +
sum(kubevirt_cdi_storageprofile_status{virtdefault="true"} or on() vector(0)) +
(count(kubevirt_cdi_datavolume_pending == 0) or on() vector(0)) == 0`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt the count of datavolumes pending needs to be bigger then 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want kubevirt_cdi_datavolume_pending > 0, so we check it here by count(kubevirt_cdi_datavolume_pending == 0) == 0

Copy link
Contributor

@ShellyKa13 ShellyKa13 Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then dont you want to count the (pending == 1) > 0?
wouldnt count(kubevirt_cdi_datavolume_pending == 0) == 0 count there are no not pending dv?

Copy link
Collaborator Author

@arnongilboa arnongilboa Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I know (discussing it with PromQL expert @machadovilaca !) to do logical && in such promQL expr is by using sum (+) of the parts which we want to be 0 and compare the sum to 0.

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@arnongilboa arnongilboa changed the title [WIP] Add new Prometheus alerts and label existing alerts Add new Prometheus alerts and label existing alerts Dec 11, 2023
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Including the VolumeSnapshot/Class/Content crds for the
CDIDefaultStorageClassDegraded alert func test.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
doc/metrics.md Outdated
### kubevirt_cdi_operator_up
CDI operator status. Type: Gauge.
### kubevirt_cdi_storageprofile_status
Copy link
Contributor

@sradco sradco Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnongilboa Please elaborate on this metric. I see from the alert that there are 3 labels: default and virtdefault, complete. What do they mean? are there additional labels?

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akalenyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2023
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@arnongilboa
Copy link
Collaborator Author

/retest

@ShellyKa13
Copy link
Contributor

/ lgtm

@sradco
Copy link
Contributor

sradco commented Dec 19, 2023

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2023
@kubevirt-bot kubevirt-bot merged commit edda5ab into kubevirt:main Dec 19, 2023
18 checks passed
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Dec 25, 2023
Manual backport of kubevirt#2998 & kubevirt#3038

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Dec 25, 2023
Manual backport of kubevirt#2998 & kubevirt#3038

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request Dec 26, 2023
…3040)

Manual backport of #2998 & #3038

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Dec 26, 2023
Manual backport of kubevirt#2998 & kubevirt#3038

- CDINoDefaultStorageClass - not having a default SC is surely not an
OpenShift error, as admins may prefer their cluster users to only use
explicit SC names. However, in the CDI context when DV is created with
default SC but default does not exist, we will fire an error event and
the PVC will be Pending for the default SC, so when there are such
Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default SC does not support
CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access
mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request Jan 3, 2024
…3041)

Manual backport of #2998 & #3038

- CDINoDefaultStorageClass - not having a default SC is surely not an
OpenShift error, as admins may prefer their cluster users to only use
explicit SC names. However, in the CDI context when DV is created with
default SC but default does not exist, we will fire an error event and
the PVC will be Pending for the default SC, so when there are such
Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default SC does not support
CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access
mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Mar 12, 2024
This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Mar 12, 2024
This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Apr 2, 2024
This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Apr 7, 2024
This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request Apr 7, 2024
* Suppress alerts to reduce noise of dependent ones

This is a follow-up to #2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove expensive func test

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Refactor updateDataImportCronCondition

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add some verify comments

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add IsDataVolumeUsingDefaultStorageClass helper for readability

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit to kubevirt-bot/containerized-data-importer that referenced this pull request Apr 8, 2024
This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot added a commit that referenced this pull request Apr 8, 2024
)

* Suppress alerts to reduce noise of dependent ones

This is a follow-up to #2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove expensive func test

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Refactor updateDataImportCronCondition

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add some verify comments

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add IsDataVolumeUsingDefaultStorageClass helper for readability

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Co-authored-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Apr 10, 2024
Manual backport of kubevirt#3129

This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:
- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
arnongilboa added a commit to arnongilboa/containerized-data-importer that referenced this pull request Apr 11, 2024
Manual backport of kubevirt#3129

This is a follow-up to kubevirt#2998 introducing the following changes to alert
rules:
- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request Apr 11, 2024
)

* [release-v1.58] Suppress alerts to reduce noise of dependent ones

Manual backport of #3129

This is a follow-up to #2998 introducing the following changes to alert
rules:
- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* [release-v1.58] Get no-provisioner storage capabilities from PVs

Manual backport of #3128

It's time to get the storage capabilities for all no-provisioner storage
classes based on the existing PVs, and not only for those labeled with
`local.storage.openshift.io/owner-name`, as done so far.
It will also silence the unnecessary CDIStorageProfilesIncomplete alert
for local SC in the CI.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
kubevirt-bot pushed a commit that referenced this pull request Apr 11, 2024
)

* [release-v1.57] Suppress alerts to reduce noise of dependent ones

Manual backport of #3129

This is a follow-up to #2998 introducing the following changes to alert
rules:
- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* [release-v1.57] Get no-provisioner storage capabilities from PVs

Manual backport of #3128

It's time to get the storage capabilities for all no-provisioner storage
classes based on the existing PVs, and not only for those labeled with
`local.storage.openshift.io/owner-name`, as done so far.
It will also silence the unnecessary CDIStorageProfilesIncomplete alert
for local SC in the CI.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants