You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run ACK Prometheus Service Controller that adopts an AMP which is provisioned through Terraform.
We want to monitor error counters to detect any possible issues with the Syncs. e.g. controller_runtime_reconcile_errors_total
When a Custom Resource e.g. RuleGroupsNamespace has ACK.Terminal status (details below),
we observe the following, controller doesn't seem to retry the resync at the set interval for unhealthy resources, as we don't see the error counter metrics controller_runtime_reconcile_errors_total, ack_outbound_api_requests_error_total to increment an resource status timestamps don't update.
Healthy resource status timestamp does update and their resync action is being captured by the controller_runtime_reconcile_total and associated success counters.
conditions:
- lastTransitionTime: '2024-06-26T09:41:56Z'
message: Resource already exists
reason: >-
This resource already exists but is not managed by ACK. To bring the
resource under ACK management, you should explicitly adopt the resource
by creating a services.k8s.aws/AdoptedResource
status: 'True'
type: ACK.Terminal
- lastTransitionTime: '2024-06-26T09:41:56Z'
message: Resource not synced
reason: resource is in terminal condition
status: 'False'
type: ACK.ResourceSynced
Expected outcome
The Failed custom resources should be resynced at defined default resync interval and the associated error counters should increment on resync failure
Environment
EKS 1.29
prometheusservice-controller:1.2.9
The text was updated successfully, but these errors were encountered:
Describe the bug
We run ACK Prometheus Service Controller that adopts an AMP which is provisioned through Terraform.
We want to monitor error counters to detect any possible issues with the Syncs. e.g. controller_runtime_reconcile_errors_total
When a Custom Resource e.g. RuleGroupsNamespace has ACK.Terminal status (details below),
we observe the following, controller doesn't seem to retry the resync at the set interval for unhealthy resources, as we don't see the error counter metrics
controller_runtime_reconcile_errors_total
,ack_outbound_api_requests_error_total
to increment an resource status timestamps don't update.Healthy resource status timestamp does update and their resync action is being captured by the controller_runtime_reconcile_total and associated success counters.
Expected outcome
The Failed custom resources should be resynced at defined default resync interval and the associated error counters should increment on resync failure
Environment
EKS 1.29
prometheusservice-controller:1.2.9
The text was updated successfully, but these errors were encountered: