-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Prometheus alerts and label existing alerts #2998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just summing up the offline discussion
cmd/cdi-controller/controller.go
Outdated
metrics.Registry.MustRegister(controller.DefaultVirtStorageClassesGauge) | ||
metrics.Registry.MustRegister(controller.DefaultStorageClassWithSmartCloneAndRWXGauge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just following up on offline discussion:
Maybe make DefaultStorageClassWithSmartCloneAndRWXGauge
just about virt storage classes? Even then I fear we can't bug people who have some basic storage solution. But as long as we see this in telemetry we can adjust ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree.
@@ -221,6 +221,21 @@ func getAlertRules(runbookURLTemplate string) []promv1.Rule { | |||
componentAlertLabelKey: componentAlertLabelValue, | |||
}, | |||
), | |||
generateAlertRule( | |||
"CDINoDefaultStorageClass", | |||
"kubevirt_cdi_default_storageclasses == 0 and kubevirt_cdi_default_virt_storageclasses == 0 and kubevirt_cdi_datavolumes_pending_for_default_storageclass > 0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should also just be about virt storage classes? DataImportCrons without explicit SC should all work if a virt storage class is defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get you. User can define default and/or virt default SC, so why require the virt one specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a similar suggestion to #2998 (comment). Reading it again though I think it's fine since it'll only alert if all of them are 0, so virt sc existence will bail us out of noise.
DefaultVirtStorageClassesGauge.Set(float64(defaultVirtStorageClassCount)) | ||
|
||
hasSmartCloneAndRWX := 0 | ||
if defaultStorageClassCount > 0 || defaultVirtStorageClassCount > 0 { | ||
sc := cc.GetFallbackStorageClass(storageClassList, cdiv1.DataVolumeKubeVirt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're already (unintentionally?) always looking up the virt default storage class
(always pass cdiv1.DataVolumeKubeVirt content type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your fallback func prefers the virt default storage class, but returns the default SC if virt default is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am saying that you already unintentionally implemented #2998 (comment) since this cc.GetFallbackStorageClass(storageClassList, cdiv1.DataVolumeKubeVirt)
will only bring you virt storage classes
Hi @arnongilboa and @akalenyu I'm also working on the same files to refactor the monitoring code to use external pkg we created, and I think it will be better if I work on top of this pr. I will upload an initial pr not including this changes by the end of today, still facing compile issues... |
pkg/monitoring/metrics.go
Outdated
DefaultVirtClasses: { | ||
Name: "kubevirt_cdi_default_virt_storageclasses", | ||
Help: "Number of default virt storage classes currently configured", | ||
Type: "Gauge", | ||
}, | ||
DataVolumesPendingForDefaultStorageClass: { | ||
Name: "kubevirt_cdi_datavolumes_pending_for_default_storageclass", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is other metrics in the form of kubevirt_cdi_***_storage***
maybe it's better to name this one kubevirt_cdi_default_storageclass_pending_datavolumes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be renamed. Thanks Aviv!
It will be ready during this week after another review by @sradco, so you better wait with your PR until this one is in. |
pkg/monitoring/metrics.go
Outdated
Type: "Gauge", | ||
}, | ||
DefaultStorageClassWithSmartCloneAndRWX: { | ||
Name: "kubevirt_cdi_default_storageclass_with_smart_clone_and_rwx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match kubevirt_cdi_dataimportcron_outdated
Maybe kubevirt_cdi_default_storageclass_smart_clone_and_rwx
Also can you explain the "smart" part, e.g. kubevirt_cdi_clone_pods_high_restart
are also smart clone pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this metric, using labels on the kubevirt_cdi_storageprofile_status
metric instead.
tests/monitoring_test.go
Outdated
if f.SnapshotSCName != defaultStorageClass.Name && f.CsiCloneSCName != defaultStorageClass.Name { | ||
Skip("Default storage class does not support snapshot or CSI clone") | ||
} | ||
if !f.IsSnapshotStorageClassAvailable() && !f.IsCSIVolumeCloneStorageClassAvailable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test will always get skipped since the destructive lane only has HPP (no snapshot support)
Maybe since we don't actually need snapshot functionality, just the vsc object, we can set it up ad hoc just for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'll try your nice idea :)
tests/monitoring_test.go
Outdated
By("Restore storage profile") | ||
updateDefaultStorageClassProfileClaimPropertySets(nil) | ||
|
||
By("Ensure metric values after restoring storage profile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to suggest to ensure that the metric is restored after the cluster state is "fixed"
but I see you already do that in all of the tests. Cool
Testing that alerts go away when things are good is really critical from my experience
c69304f
to
3a33606
Compare
/retest |
}, | ||
map[string]string{ | ||
severityAlertLabelKey: "warning", | ||
healthImpactAlertLabelKey: "warning", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnongilboa Please make sure this is the correct impact level on the operator health.
}, | ||
map[string]string{ | ||
severityAlertLabelKey: "warning", | ||
healthImpactAlertLabelKey: "warning", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnongilboa Please make sure this is the correct impact level on the operator health.
httpClient *http.Client | ||
|
||
// DataVolumePendingGauge is the metric we use to count the DataVolumes pending for default storage class to be configured | ||
DataVolumePendingGauge = prometheus.NewGauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this variable and all metrics related functions would be in a dedicated package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each metric is maintained in the relevant packge where the logic is (DV, DIC, StorageProfile etc.). I don't think we should put them all in one package. However, the alerts are based on multiple metrics and therefore defined in a single package.
@@ -388,6 +402,25 @@ func getSourceRefOp(log logr.Logger, dv *cdiv1.DataVolume, client client.Client) | |||
} | |||
} | |||
|
|||
func updatePendingDataVolumesGauge(c client.Client, dv *cdiv1.DataVolume) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you changed the client name from c to client in another PR so maybe you should do it here too :)
} | ||
|
||
dvList := &cdiv1.DataVolumeList{} | ||
if c.List(context.TODO(), dvList, client.MatchingFields{dvPhaseField: string(cdiv1.Pending)}) != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to log (in a low verbosity) that we failed getting the list and that why the mertic will not get updated?
dvCount++ | ||
} | ||
} | ||
DataVolumePendingGauge.Set(float64(dvCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not so familiar with how the metric is being calculated but is there any considerations to namespaces at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are interested in the total default-SC pending DVs in all namespaces (whether it is zero or not).
} | ||
|
||
return reconcile.Result{}, r.computeMetrics() | ||
return r.reconcileStorageProfile(storageClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this way we will compute the metric only if no error occured are we sure thats what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We update the metric of a specific SC only if no error occured, which seems to be a consistent behavior. In case of SC deletion, we delete the metric.
// DefaultVirtStorageClassesGauge is the metric we use to alert about multiple default virt storage classes | ||
DefaultVirtStorageClassesGauge = prometheus.NewGauge( | ||
// StorageProfileStatusGaugeVec is the metric we use to alert about storage profile status | ||
StorageProfileStatusGaugeVec = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned I think it will be much cleaner to have all Prometheus functionality and variables in dedicated pkg/file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree, as already answered. However we can do it in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ShellyKa13, the metrics and alerts would be moved to a dedicated package as part of this PR #3009
"CDINoDefaultStorageClass", | ||
`sum(kubevirt_cdi_storageprofile_status{default="true"} or on() vector(0)) + | ||
sum(kubevirt_cdi_storageprofile_status{virtdefault="true"} or on() vector(0)) + | ||
(count(kubevirt_cdi_datavolume_pending == 0) or on() vector(0)) == 0`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt the count of datavolumes pending needs to be bigger then 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want kubevirt_cdi_datavolume_pending > 0
, so we check it here by count(kubevirt_cdi_datavolume_pending == 0) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then dont you want to count the (pending == 1) > 0?
wouldnt count(kubevirt_cdi_datavolume_pending == 0) == 0 count there are no not pending dv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way I know (discussing it with PromQL expert @machadovilaca !) to do logical &&
in such promQL expr is by using sum (+
) of the parts which we want to be 0 and compare the sum to 0.
- CDINoDefaultStorageClass - not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Including the VolumeSnapshot/Class/Content crds for the CDIDefaultStorageClassDegraded alert func test. Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
doc/metrics.md
Outdated
### kubevirt_cdi_operator_up | ||
CDI operator status. Type: Gauge. | ||
### kubevirt_cdi_storageprofile_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnongilboa Please elaborate on this metric. I see from the alert that there are 3 labels: default and virtdefault, complete. What do they mean? are there additional labels?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akalenyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
/retest |
/ lgtm |
/lgtm |
Manual backport of kubevirt#2998 & kubevirt#3038 - CDINoDefaultStorageClass - not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Also: * Rename the metric kubevirt_cdi_storageprofile_status to kubevirt_cdi_storageprofile_info since it always reports value 1, where the label values provide the details about the storage class and storage profile. * Add snapshot manifests for tests and deploy snapshot CRDs in the hpp destructive lane Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Manual backport of kubevirt#2998 & kubevirt#3038 - CDINoDefaultStorageClass - not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Also: * Rename the metric kubevirt_cdi_storageprofile_status to kubevirt_cdi_storageprofile_info since it always reports value 1, where the label values provide the details about the storage class and storage profile. * Add snapshot manifests for tests and deploy snapshot CRDs in the hpp destructive lane Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
…3040) Manual backport of #2998 & #3038 - CDINoDefaultStorageClass - not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Also: * Rename the metric kubevirt_cdi_storageprofile_status to kubevirt_cdi_storageprofile_info since it always reports value 1, where the label values provide the details about the storage class and storage profile. * Add snapshot manifests for tests and deploy snapshot CRDs in the hpp destructive lane Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Manual backport of kubevirt#2998 & kubevirt#3038 - CDINoDefaultStorageClass - not having a default SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Also: * Rename the metric kubevirt_cdi_storageprofile_status to kubevirt_cdi_storageprofile_info since it always reports value 1, where the label values provide the details about the storage class and storage profile. * Add snapshot manifests for tests and deploy snapshot CRDs in the hpp destructive lane Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
…3041) Manual backport of #2998 & #3038 - CDINoDefaultStorageClass - not having a default SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert. - CDIDefaultStorageClassDegraded - when the default SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration). - CDIStorageProfilesIncomplete - add storageClass and provisioner labels. - CDIDataImportCronOutdated - add dataImportCron namespace and name labels. Also: * Rename the metric kubevirt_cdi_storageprofile_status to kubevirt_cdi_storageprofile_info since it always reports value 1, where the label values provide the details about the storage class and storage profile. * Add snapshot manifests for tests and deploy snapshot CRDs in the hpp destructive lane Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Suppress alerts to reduce noise of dependent ones This is a follow-up to #2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * CR fixes Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Remove expensive func test Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Refactor updateDataImportCronCondition Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Add some verify comments Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Add IsDataVolumeUsingDefaultStorageClass helper for readability Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
) * Suppress alerts to reduce noise of dependent ones This is a follow-up to #2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * CR fixes Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Remove expensive func test Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Refactor updateDataImportCronCondition Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Add some verify comments Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Add IsDataVolumeUsingDefaultStorageClass helper for readability Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Arnon Gilboa <agilboa@redhat.com> Co-authored-by: Arnon Gilboa <agilboa@redhat.com>
Manual backport of kubevirt#3129 This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Manual backport of kubevirt#3129 This is a follow-up to kubevirt#2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
) * [release-v1.58] Suppress alerts to reduce noise of dependent ones Manual backport of #3129 This is a follow-up to #2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * [release-v1.58] Get no-provisioner storage capabilities from PVs Manual backport of #3128 It's time to get the storage capabilities for all no-provisioner storage classes based on the existing PVs, and not only for those labeled with `local.storage.openshift.io/owner-name`, as done so far. It will also silence the unnecessary CDIStorageProfilesIncomplete alert for local SC in the CI. Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
) * [release-v1.57] Suppress alerts to reduce noise of dependent ones Manual backport of #3129 This is a follow-up to #2998 introducing the following changes to alert rules: - CDIDefaultStorageClassDegraded - do not fire when no default SC (either k8s or virt). - CDIDataImportCronOutdated - do not fire when no default SC (either k8s or virt), as DIC import DVs use default SC. - CDINoDefaultStorageClass - fire not only when there is a pending DV (and PVC) but also when DV has an empty status (waiting for default SC etc.) Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * [release-v1.57] Get no-provisioner storage capabilities from PVs Manual backport of #3128 It's time to get the storage capabilities for all no-provisioner storage classes based on the existing PVs, and not only for those labeled with `local.storage.openshift.io/owner-name`, as done so far. It will also silence the unnecessary CDIStorageProfilesIncomplete alert for local SC in the CI. Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
What this PR does / why we need it:
CDINoDefaultStorageClass
- not having a default (or virt default) SC is surely not an OpenShift error, as admins may prefer their cluster users to only use explicit SC names. However, in the CDI context when DV is created with default SC but default does not exist, we will fire an error event and the PVC will be Pending for the default SC, so when there are such Pending PVCs we will fire an alert.CDIDefaultStorageClassDegraded
- when the default (or virt default) SC does not support CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access mode (for live migration).CDIStorageProfilesIncomplete
- add storageClass and provisioner labels.CDIDataImportCronOutdated
- add dataImportCron namespace and name labels.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: