Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context to "Watch failed" #14134

Open
3 tasks
jsoref opened this issue Jun 20, 2023 · 10 comments
Open
3 tasks

Add context to "Watch failed" #14134

jsoref opened this issue Jun 20, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@jsoref
Copy link
Member

jsoref commented Jun 20, 2023

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

https://cloud-native.slack.com/archives/C01TSERG0KZ/p1687186844752359?thread_ts=1687185656.068269&cid=C01TSERG0KZ

We see a lot of;

1 retrywatcher.go:130] "Watch failed" err="context canceled"

To Reproduce

Dunno

Expected behavior

Log messages should give enough context for someone reading them to understand what's going on, i.e. "I tried to do x and a watch failed".

Screenshots

Version

{
    "Version": "v2.7.4+a33baa3.dirty",
    "BuildDate": "2023-06-05T19:00:34Z",
    "GitCommit": "a33baa301fe61b899dc8bbad9e554efbc77e0991",
    "GitTreeState": "dirty",
    "GoVersion": "go1.19.6",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.0.1 2023-03-14T01:32:48Z",
    "HelmVersion": "v3.11.2+g912ebc1",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.19.1"
}

Logs

Paste any relevant application logs here.
@jsoref jsoref added the bug Something isn't working label Jun 20, 2023
@jenting
Copy link
Contributor

jenting commented Aug 3, 2023

Similar to the issue which is address in the later version.

@weslers
Copy link

weslers commented Dec 6, 2023

Having the same issue and running v2.8.4+c279299

@woehrl01
Copy link
Contributor

Same with v2.10.0+2175939

@black-snow
Copy link

v2.11.0+d3f33c0 seeing this without a clue what's happening.

@crdrost
Copy link

crdrost commented May 17, 2024

Note that this line is also one of the lines that's not in the right format, see #5715 for that

@andrii-korotkov-verkada
Copy link
Contributor

Seems like logs for watches have improved a lot https://github.com/argoproj/gitops-engine/blob/847cfc9f8b200e96a70b591a68b9fb385cf2ce56/pkg/cache/cluster.go#L607-L735. I see logs like:

Failed to watch Pod on <address>: Resyncing Pod on <address> due to timeout, retrying in 1s
Start watch Pod on <address>

Those are normal for operation, though perhaps should be debug, not info.
What do you think?

@andrii-korotkov-verkada andrii-korotkov-verkada added the more-information-needed Further information is requested label Nov 14, 2024
@jsoref
Copy link
Member Author

jsoref commented Nov 14, 2024

I'm pretty sure I still hit this. I really want someone to help me get the linked PR merged into kubernetes. I do not have the energy to fight the kubernetes project's bots/processes.

@andrii-korotkov-verkada
Copy link
Contributor

I've checked for some changes, and looks like they added a better handling of several error types a few months ago, e.g. https://github.com/kubernetes/kubernetes/blame/475ee33f698334e5b00c58d3bef4083840ec12c5/staging/src/k8s.io/client-go/tools/watch/retrywatcher.go#L133.

@jsoref
Copy link
Member Author

jsoref commented Nov 14, 2024

I think my PR ended up w/ merge conflicts and I ran out of energy. But that project is really draining. And not remotely supportive.

@andrii-korotkov-verkada
Copy link
Contributor

I see :(

@andrii-korotkov-verkada andrii-korotkov-verkada removed the more-information-needed Further information is requested label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants