-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix multiListWatch resourceVersion mismatch if watch reconnected #1377
fix multiListWatch resourceVersion mismatch if watch reconnected #1377
Conversation
Signed-off-by: KielChan <qingya.chen520@gmail.com>
Welcome @KielChan! |
Yes same engineers worked on both things. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Yes I had this in the back of my head for some time as I worked on the prom-operator bug fix as well, but never actually ran into this in kube-state-metrics in either our customer or my personal clusters.
Do you mind just pasting the logs and version this error occurred in, so we know where to aim the patch towards?
Logs found in container std console, but now containers are deleted. :( I found some messages like The codes are in |
I have reproduced it now:
|
@lilic hi,any progresses ? |
Hey, will take the time to review this very soon, sorry for the delay! |
Asked @s-urbaniak to have a review as he fixed it in Prometheus operator. |
Thank you @KielChan for the contribution and the fix! As you correctly identified the existing code relies on the invariant that a Having said that even with the proposed fix multilistwatcher relies on internal behavior of client-go which can again change any-time. After having talked to various maintainers of api-machiner tweaking the In the long-term I highly suggest to migrate to native Informers per namespace, see https://github.com/prometheus-operator/prometheus-operator/tree/master/pkg/informers. Generally i suggest to create a help-wanted issue to factor out multilistwatcher in favor of native informers. |
Thanks @s-urbaniak I proposed this already in #1413. /lgtm Thank you for the fix! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: KielChan, lilic The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: KielChan qingya.chen520@gmail.com
What this PR does / why we need it: bug fix
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # none
I have a problem that report:
Failed to watch *v1.Pod: expected resource version to have 2 parts to match the number of ListerWatchers
which I wanna collect two namespaces pod metrics with kube-state-metrics. The origin code will split resource version combined by
List
method, but if it is expired or timeout, the client-go will use the latest event.Object resource version to re-watch resources which will lead to the problem.The same problem occurred here: link. It seemed like using same code.