Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resync not occuring for failing custom resources and prometheus error counters don't increment #2098

Open
sergiotiu opened this issue Jun 26, 2024 · 0 comments

Comments

@sergiotiu
Copy link

Describe the bug

We run ACK Prometheus Service Controller that adopts an AMP which is provisioned through Terraform.
We want to monitor error counters to detect any possible issues with the Syncs. e.g. controller_runtime_reconcile_errors_total

When a Custom Resource e.g. RuleGroupsNamespace has ACK.Terminal status (details below),
we observe the following, controller doesn't seem to retry the resync at the set interval for unhealthy resources, as we don't see the error counter metrics controller_runtime_reconcile_errors_total, ack_outbound_api_requests_error_total to increment an resource status timestamps don't update.

Healthy resource status timestamp does update and their resync action is being captured by the controller_runtime_reconcile_total and associated success counters.

  conditions:
    - lastTransitionTime: '2024-06-26T09:41:56Z'
      message: Resource already exists
      reason: >-
        This resource already exists but is not managed by ACK. To bring the
        resource under ACK management, you should explicitly adopt the resource
        by creating a services.k8s.aws/AdoptedResource
      status: 'True'
      type: ACK.Terminal
    - lastTransitionTime: '2024-06-26T09:41:56Z'
      message: Resource not synced
      reason: resource is in terminal condition
      status: 'False'
      type: ACK.ResourceSynced

Expected outcome

The Failed custom resources should be resynced at defined default resync interval and the associated error counters should increment on resync failure

Environment

EKS 1.29
prometheusservice-controller:1.2.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant