Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more logs and/or duration fields to existing logs to give more visibility into slow reconciliations #18923

Closed
andrii-korotkov-verkada opened this issue Jul 3, 2024 · 3 comments · Fixed by #18926
Labels
bug/enhancement component:core Syncing, diffing, cluster state cache enhancement New feature or request type:enhancement

Comments

@andrii-korotkov-verkada
Copy link
Contributor

Summary

There's some logging for different steps taken during reconciliation, as well as some duration fields on "Reconciliation completed" log entries. However, they aren't enough to tell where the slowness may be coming from.

Motivation

Sometimes reconciliation takes 30+ min and it's unclear why. Logs don't give enough info.
Screenshot 2024-07-03 at 11 38 01 AM
Screenshot 2024-07-03 at 11 38 22 AM

Proposal

Add more log entries through different steps of reconciliation and/or add more timing/duration information to "Reconciliation completed".

@andrii-korotkov-verkada andrii-korotkov-verkada added the enhancement New feature or request label Jul 3, 2024
@alexmt alexmt added bug/enhancement component:core Syncing, diffing, cluster state cache type:enhancement labels Jul 3, 2024
@crenshaw-dev
Copy link
Member

Very strongly in favor of this. We also need documentation about these different metrics.

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 3, 2024
…proj#18923]

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 3, 2024
argoproj#18923]

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, I've sent a PR. There are a lot of metrics being added though and I've followed function names to name most of those. What's the best way to write a documentation for these? Should it be oriented at people who'd read the controller code, or should it also appeal to people who won't?

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 3, 2024
argoproj#18923]

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 3, 2024
argoproj#18923]

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.
Also, fix a flaky test in app_test.go.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 4, 2024
argoproj#18923]

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.
Also, fix a flaky test in app_test.go.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@andrii-korotkov-verkada
Copy link
Contributor Author

Already discovered slowness/optimization opportunity! #18929

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 4, 2024
Documents argoproj#18923
Add the info about existing and new logs that are being added for reconciliation.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 8, 2024
The timing started before reconciliation timing started including get from the queue, leading to very big times reported, not making sense for what's actually going on.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
ishitasequeira pushed a commit that referenced this issue Jul 10, 2024
The timing started before reconciliation timing started including get from the queue, leading to very big times reported, not making sense for what's actually going on.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
crenshaw-dev pushed a commit to crenshaw-dev/argo-cd that referenced this issue Jul 17, 2024
argoproj#18923] (argoproj#18926)

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.
Also, fix a flaky test in app_test.go.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
ggjulio pushed a commit to ggjulio/argo-cd that referenced this issue Jul 21, 2024
argoproj#18923] (argoproj#18926)

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.
Also, fix a flaky test in app_test.go.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
ggjulio pushed a commit to ggjulio/argo-cd that referenced this issue Jul 21, 2024
The timing started before reconciliation timing started including get from the queue, leading to very big times reported, not making sense for what's actually going on.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
vfaergestad pushed a commit to vfaergestad/argo-cd that referenced this issue Jul 22, 2024
The timing started before reconciliation timing started including get from the queue, leading to very big times reported, not making sense for what's actually going on.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Signed-off-by: Vegard Færgestad <vegardf@mnemonic.no>
jsolana pushed a commit to jsolana/argo-cd that referenced this issue Jul 24, 2024
argoproj#18923] (argoproj#18926)

Closes argoproj#18923
There are some gaps in debugging information for long reconciliations. Fill in a lot of those gaps by adding more debug logs with timing information about different code execution steps.
Also, fix a flaky test in app_test.go.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
jsolana pushed a commit to jsolana/argo-cd that referenced this issue Jul 24, 2024
The timing started before reconciliation timing started including get from the queue, leading to very big times reported, not making sense for what's actually going on.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/enhancement component:core Syncing, diffing, cluster state cache enhancement New feature or request type:enhancement
Projects
None yet
3 participants