Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request kubernetes#46463 from wongma7/getinstances
Automatic merge from submit-queue (batch tested with PRs 46489, 46281, 46463, 46114, 43946) AWS: consider instances of all states in DisksAreAttached, not just "running" Require callers of `getInstancesByNodeNames(Cached)` to specify the states they want to filter instances by, if any. DisksAreAttached, cannot only get "running" instances because of the following attach/detach bug we discovered: 1. Node A stops (or reboots) and stays down for x amount of time 2. Kube reschedules all pods to different nodes; the ones using ebs volumes cannot run because their volumes are still attached to node A 3. Verify volumes are attached check happens while node A is down 4. Since aws ebs bulk verify filters by running nodes, it assumes the volumes attached to node A are detached and removes them all from ASW 5. Node A comes back; its volumes are still attached to it but the attach detach controller has removed them all from asw and so will never detach them even though they are no longer desired on this node and in fact desired elsewhere 6. Pods cannot run because their volumes are still attached to node A So the idea here is to remove the wrong assumption that callers of `getInstancesByNodeNames(Cached)` only want "running" nodes. I hope this isn't too confusing, open to alternative ways of fixing the bug + making the code nice. ping @gnufied @kubernetes/sig-storage-bugs ```release-note Fix AWS EBS volumes not getting detached from node if routine to verify volumes are attached runs while the node is down ```
- Loading branch information