Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve refreshes observability #17122

Closed
jsolana opened this issue Feb 7, 2024 · 2 comments
Closed

Improve refreshes observability #17122

jsolana opened this issue Feb 7, 2024 · 2 comments
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request

Comments

@jsolana
Copy link
Contributor

jsolana commented Feb 7, 2024

Summary

Enhance observability of refresh operations.

Related to: #8192

Motivation

Reducing the feedback loop: In self-service environments managing dozens of applications, it's common for applications to be installed and updated autonomously, sometimes resulting in unexpected behaviors where certain resources or controllers begin interfering with the application controller due to updates to untracked resources or orphaned resources.

Identifying these situations and addressing them can be time-consuming. Often, changing the log level to Debug is necessary to trace the source of these refresh requests, which prolongs the time needed to resolve performance issues in production environments. Entering into sync loops can also affect other applications, causing them to become stuck in an Unknown or OutOfSync status.

As a temporary workaround, one can configure the application controller to log at the Debug level indefinitely, but this can lead to verbose logs and increased consumption of I/O resources.

Given the significance of this issue, ensuring good observability is crucial for proactive anticipation and the application of reconciliation optimizations before system degradation occurs.

Proposal

Introduce a new count metric in the application controller to track the number of refresh requests, named argocd_app_refresh_total.

Labels: namespace, server, is_managed (to identify whether the object updated triggering the refresh is an orphan or not), application name.

Additionally, considering the relevance of orphan monitoring, it might be beneficial to relocate the orphan monitoring documentation under the operator manual instead of the user manual.

@jsolana jsolana added the enhancement New feature or request label Feb 7, 2024
@jsolana jsolana changed the title Promote resource triggering reconciliation log to info Improve refreshes observability Feb 9, 2024
@jsolana
Copy link
Contributor Author

jsolana commented Feb 9, 2024

cc: @crenshaw-dev

@jsolana
Copy link
Contributor Author

jsolana commented Feb 10, 2024

Instead of is_managed as label, maybe we can use compare_with. Refresh requested by orphan resources updated would have ComparisonWithNothing values.

Wdyt?

@jgwest jgwest added the component:core Syncing, diffing, cluster state cache label Feb 15, 2024
@jsolana jsolana closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants