Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix agent stuck on drain bug #181

Merged
merged 1 commit into from
Apr 6, 2022

Conversation

gthao313
Copy link
Member

@gthao313 gthao313 commented Apr 4, 2022

Issue number:
#168

Description of changes:

    fix agent stuck on drain bug

    Some customers are using StatefulSet which makes pod's name unchangeable,
    and we use the name of the evicted pod and query the k8s API on wait_for_deletion
    fucntion, waiting for it to 404 before declaring a pod deleted.
    Therefore, current logic can't handle this sepcial case. We add a new if
    statement condition to compare two pods by uid to make sure if they are the
    same pod. If they are same pod, brupop should wait until it be deleted,
    otherwise we should recognize the old pod has already been deleted and
    re-lives on other node.

Testing done:

  1. Apply StatefulSet to cluster
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          5m32s
web-1   1/1     Running   0          10m
web-2   1/1     Running   0          5m22s


Name:         web-0
Namespace:    default
Priority:     0
Node:         ip-192-168-156-161.us-west-2.compute.internal/192.168.156.161
....
  1. Run brupop integration test and monitor update
NAME                                                STATE   VERSION   TARGET STATE   TARGET VERSION   CRASH COUNT
brs-ip-192-168-128-161.us-west-2.compute.internal   Idle    1.7.0     Idle                            0
brs-ip-192-168-144-77.us-west-2.compute.internal    Idle    1.7.0     Idle                            0
brs-ip-192-168-156-161.us-west-2.compute.internal   Idle    1.7.0     Idle                            0

apiserver log

[EVICT_POD - EVENT] Attempting to evict pod web-0",
[EVICT_POD - EVENT] Successfully evicted Pod 'web-0'"
[WAIT_FOR_DELETION - EVENT] Pod web-0 deleted.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Comment on lines 291 to 293
Ok(p) => {
if p.uid() != pod.uid() {
event!(Level::INFO, "Pod {} deleted.", p.name(),);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

I'd suggest using "match guards" here to keep the indentation level as before. e.g.

Err(kube::Error::Api(e)) if e.code == 404 => {...},
Ok(p) if p.uid() != pod.uid() => {...}
Ok(p) => {...},

Some customers are using StatefulSet which makes pod's name unchangeable,
and we use the name of the evicted pod and query the k8s API on wait_for_deletion
fucntion, waiting for it to 404 before declaring a pod deleted.
Therefore, current logic can't handle this sepcial case. We add a new if
statement condition to compare two pods by uid to make sure if they are the
same pod. If they are same pod, brupop should wait until it be deleted,
otherwise we should recognize the old pod has already been deleted and
re-lives on other node.
@gthao313 gthao313 merged commit 4022737 into bottlerocket-os:develop Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants