Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Advanced DaemonSet stuck in Rolling Update #1012

Closed
aaronseahyh opened this issue Jun 30, 2022 · 3 comments · Fixed by #1014
Closed

[BUG] Advanced DaemonSet stuck in Rolling Update #1012

aaronseahyh opened this issue Jun 30, 2022 · 3 comments · Fixed by #1014
Assignees
Labels
kind/bug Something isn't working

Comments

@aaronseahyh
Copy link

What happened:
During RollingUpdate, old daemons are not terminated after new daemons are successfully created and running.

What you expected to happen:
Expect old daemons to be terminated after new daemons are successfully created and running.

How to reproduce it (as minimally and precisely as possible):

  1. Install kruise v1.2.0
  2. Create ads.yaml
  3. Update image version (ads.yaml#L28; v2.6.0 -> v2.7.0)
    a. Observe that new daemons are created and running
    b. Observe that old daemons are not terminated
  4. Delete an old daemon pod kubectl delete pod <pod_name> OR update image again (ads.yaml#L28; v2.7.0 -> v2.8.0)
    a. Observe that the other old daemon pods are terminated

Anything else we need to know?:

  • After manually terminating one of the old daemon pods with kubectl delete pod <pod_name>, the rest of the old daemon pods start terminating.
  • When another RollingUpdate on the same ads is started, the old daemon pods start terminating.
  • Advanced DaemonSet RollingUpdate worked fine in kruise v1.0.1.
  • Same bug is present in kruise v1.1.0
  • Same bug is present with kubectl v1.24
  • Same bug is present when kruise installed with helm install kruise openkruise/kruise --version 1.2.0

Environment:

  • Kruise version: v1.2.0
  • Kubernetes version (use kubectl version): v1.18
  • Install details (e.g. helm install args):
helm fetch --untar --untardir charts openkruise/kruise --version 1.2.0
helm template charts/kruise > kruise-1.2.0.yaml
kustomize create --autodetect
kubectl apply -k .

Advanced DaemonSet manifest - ads.yaml (from DaemonSet | Kubernetes)

apiVersion: apps.kruise.io/v1alpha1
kind: DaemonSet
metadata:
  name: ads-test
spec:
  minReadySeconds: 15
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 100%
      partition: 0
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      # this toleration is to have the daemonset runnable on master nodes
      # remove it if your masters can't run pods
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.6.0
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
@aaronseahyh aaronseahyh added the kind/bug Something isn't working label Jun 30, 2022
@FillZpp
Copy link
Member

FillZpp commented Jun 30, 2022

@aaronseahyh Thanks for reporting. I reproduced your case, turns out it should be a bug of minReadySeconds, which lacks enqueueAfter for those surging pods.

@aaronseahyh
Copy link
Author

@FillZpp Thank you for the fix. I see that the PR has been merged. Will this be included in an upcoming minor release?

@FillZpp
Copy link
Member

FillZpp commented Jul 5, 2022

@FillZpp Thank you for the fix. I see that the PR has been merged. Will this be included in an upcoming minor release?

Yeah, the new v1.2.1 is coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants