-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external-attacher does not see nodeID annotation for 2 min? #102
Comments
Looking at the logs, exponential backoff is weird. Controller got synced at these intervals:
Recalculated to 10 of milliseconds between calls:
It shows some exponential growth from 16 through 32, 64, 128, but it misses 256, 1024 and lot of numbers after 2048. Unfortunately, informer does not log anything. As side note, these short 1-2ms intervals in between are caused by timestamp in VolumeAttachment.Status.Error - when backoff timer fires up, the controller re-tries attach and writes new timestamp to Status. This triggers new "VolumeAttachment changed" event and the controller reconciles the object again. It tries to attach the volume again and writes new .Status. At this point, the timestamp is the same as before (it's rounded to seconds), so it does not generate new "VolumeAttachment changed" event. IMO we could do something about it and not to try attaching the volume so quickly, however, it most probably is not root cause of this bug. |
This "VolumeAttachment changed" is related to the bug. New attach fails and VolumeAttachment is re-enqueued with new interval, overwriting the old one. So VolumeAttachment is enqueued with timeout 2*N instead of N, as we can see aboce. I thought that workqueue is smart enough not to enqueue an item if the item is still in the queue. I'll ignore "VolumeAttachment updated" events if the only difference is VolumeAttachment.Status.Error. That does not explain why there is such a long gap between 2046.0 (=20s) and 16384 (=168s) though. And that's where these 2.5 minutes are, the previous syncs take ~30 seconds together. |
I create #104, it does not retry to attach/detach on VolumeAttachment changed events caused by AttachError write. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an analysis of the following test flake: https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/4398
So for the most part seems like everything is operating as expected.
Outstanding question is why did external-attacher not see the nodeID annotation until 2 min after it was added to node?
The text was updated successfully, but these errors were encountered: