k8s_drain not waiting for pod deletion if there is only one pod to evict #769

OttaviaB · 2024-07-31T11:31:09Z

SUMMARY

In the module k8s_drain the result of the draining process is never checked if there is only one pod to evict from the node.

ISSUE TYPE

Bug Report

COMPONENT NAME

k8s_drain

ANSIBLE VERSION

ansible [core 2.15.12]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/venvs/ansible/lib64/python3.9/site-packages/ansible
  ansible collection location = ~/.ansible/collections:/usr/share/ansible/collections
  executable location = ~/venvs/ansible/bin/ansible
  python version = 3.9.19 (main, Mar 22 2024, 21:01:47) [GCC] (~/venvs/ansible/bin/python3.9)
  jinja version = 3.1.4
  libyaml = True

COLLECTION VERSION

# ~/.ansible/collections/ansible_collections
Collection                    Version
----------------------------- -------
kubernetes.core               5.0.0

CONFIGURATION

CALLBACKS_ENABLED(~/.ansible.cfg) = ['ansible.posix.profile_tasks']
CONFIG_FILE() = ~/.ansible.cfg
PAGER(env: PAGER) = less

OS / ENVIRONMENT

Ansible controller: SLES 15-SP5
Remote hosts: SLES 15-SP5
RKE2 Version: v1.28.10+rke2r1

STEPS TO REPRODUCE

deploy a single pod running only on one node
- Problem is to be observed best with a pod that takes a long time to terminate.
- e. g. a deployment like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: draintest
  namespace: kube-public
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: draintaints
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: draintaints
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - preference:
                matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                      - <node_name>
              weight: 100
      containers:
        - command:
            - bash
            - '-c'
            - |
              trap 'echo -e "\nShutting down..."; sleep 60; exit' SIGTERM
              while true; do 
                  echo "Hi, I'm Steve!";
                  sleep 1
              done
          image: <registry>/shell:v2.0
          imagePullPolicy: IfNotPresent
          name: main
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsGroup: 12345
            runAsNonRoot: true
            runAsUser: 12345
            seccompProfile:
              type: RuntimeDefault
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 200

drain this node

---
- hosts: "{{ target }}"
  serial: 1
  gather_facts: false
  vars:
    kubeconfig_dir: "~/.kubeconfigs"
  tasks:
    - name: Drain node
      kubernetes.core.k8s_drain:
        kubeconfig: "{{ kubeconfig_path }}"
        name: "{{ inventory_hostname }}"
        delete_options:
          ignore_daemonsets: true
          delete_emptydir_data: true
          wait_timeout: 100
          terminate_grace_period: 30
          force: true
      delegate_to: localhost

EXPECTED RESULTS

k8s_drain should wait until the pod is evicted successfully or run in a timeout.

ACTUAL RESULTS

k8s_drain reports immediately status "changed" no matter if the pod was evicted or not.

PLAY [Test] ****************************************************************************************************************************************************************************

TASK [Drain node] **************************************************************************************************************************************************************************
Montag 15 Juli 2024  15:48:57 +0200 (0:00:00.047)       0:00:00.047 *********** 
[WARNING]: cannot delete mirror Pods using API server: kube-system/kube-proxy-<node_name>.
changed: [<node_name> -> localhost]

PLAY RECAP *********************************************************************************************************************************************************************************
<node_name>             : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Montag 15 Juli 2024  15:48:58 +0200 (0:00:01.202)       0:00:01.249 *********** 
=============================================================================== 
Drain node -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.20s

Even though the pod needs 60 seconds to terminate, k8s_drain reports deletion already after 2 seconds while the pod is still running.

The text was updated successfully, but these errors were encountered:

SUMMARY Fixes #769 . k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained. The list of pods, pods, was being "popped" before the first iteration of the while loop: pod = pods.pop() while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods: When pods contains only one element, the while loop is skipped. ISSUE TYPE Bugfix Pull Request COMPONENT NAME k8s_drain Reviewed-by: Mike Graves <mgraves@redhat.com> (cherry picked from commit 4c305e7)

This is a backport of PR #770 as merged into main (4c305e7). SUMMARY Fixes #769 . k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained. The list of pods, pods, was being "popped" before the first iteration of the while loop: pod = pods.pop() while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods: When pods contains only one element, the while loop is skipped. ISSUE TYPE Bugfix Pull Request COMPONENT NAME k8s_drain

SUMMARY Version 3.3.0 of ansible-collection kubernetes.core came with several improvements and bugfixes ISSUE TYPE New release pull request Changelog Minor Changes k8s_drain - Improve error message for pod disruption budget when draining a node (#797). Bugfixes helm - Helm version checks did not support RC versions. They now accept any version tags. (#745). helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (#796). k8s_drain - Fix k8s_drain does not wait for single pod (#769). k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set (#792). kubeconfig option should not appear in module invocation log (#782). kustomize - kustomize plugin fails with deprecation warnings (#639). waiter - Fix waiting for daemonset when desired number of pods is 0. (#756). ADDITIONAL INFORMATION Collection kubernets.core version 3.3.0 is compatible with ansible-core>=2.14.0 Reviewed-by: Alina Buzachis Reviewed-by: Yuriy Novostavskiy Reviewed-by: Mike Graves <mgraves@redhat.com>

SUMMARY This release came with new module helm_registry_auth, and improvements to the error messages in the k8s_drain module, new parameter insecure_registry for helm_template module and several bug fixes. ISSUE TYPE New release pull request Changelog Minor Changes Bump version of ansible-lint to minimum 24.7.0 (#765). Parameter insecure_registry added to helm_template as equivalent of insecure-skip-tls-verify (#805). connection/kubectl.py - Added an example of using the kubectl connection plugin to the documentation (#741). k8s_drain - Improve error message for pod disruption budget when draining a node (#797). Bugfixes helm - Helm version checks did not support RC versions. They now accept any version tags. (#745). helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (#796). k8s_drain - Fix k8s_drain does not wait for single pod (#769). k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set (#792). kubeconfig option should not appear in module invocation log (#782). kustomize - kustomize plugin fails with deprecation warnings (#639). waiter - Fix waiting for daemonset when desired number of pods is 0. (#756). New Modules helm_registry_auth - Helm registry authentication module ADDITIONAL INFORMATION Collection kubernets.core version 3.1.0 is compatible with ansible-core>=2.15.0 Reviewed-by: Mike Graves <mgraves@redhat.com>

SUMMARY This release came with new module helm_registry_auth, and improvements to the error messages in the k8s_drain module, new parameter insecure_registry for helm_template module and several bug fixes. ISSUE TYPE New release pull request Changelog Minor Changes Bump version of ansible-lint to minimum 24.7.0 (ansible-collections#765). Parameter insecure_registry added to helm_template as equivalent of insecure-skip-tls-verify (ansible-collections#805). connection/kubectl.py - Added an example of using the kubectl connection plugin to the documentation (ansible-collections#741). k8s_drain - Improve error message for pod disruption budget when draining a node (ansible-collections#797). Bugfixes helm - Helm version checks did not support RC versions. They now accept any version tags. (ansible-collections#745). helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (ansible-collections#796). k8s_drain - Fix k8s_drain does not wait for single pod (ansible-collections#769). k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set (ansible-collections#792). kubeconfig option should not appear in module invocation log (ansible-collections#782). kustomize - kustomize plugin fails with deprecation warnings (ansible-collections#639). waiter - Fix waiting for daemonset when desired number of pods is 0. (ansible-collections#756). New Modules helm_registry_auth - Helm registry authentication module ADDITIONAL INFORMATION Collection kubernets.core version 3.1.0 is compatible with ansible-core>=2.15.0 Reviewed-by: Mike Graves <mgraves@redhat.com>

OttaviaB mentioned this issue Jul 31, 2024

Make k8s_drain work when only one pod is present #770

Merged

softwarefactory-project-zuul bot closed this as completed in #770 Nov 1, 2024

softwarefactory-project-zuul bot closed this as completed in 4c305e7 Nov 1, 2024

This was referenced Dec 11, 2024

[PR #770/4c305e73 backport][stable-5] Make k8s_drain work when only one pod is present #820

Merged

[PR #770/4c305e73 backport][stable-3] Make k8s_drain work when only one pod is present #821

Merged

yurnov mentioned this issue Jan 20, 2025

Prepare release 3.3.0 #863

Merged

This was referenced Jan 20, 2025

Prepare release 5.1.0 #864

Closed

Prepare release 5.1.0 #865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

OttaviaB commented Jul 31, 2024

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

Comments

OttaviaB commented Jul 31, 2024

SUMMARY

ISSUE TYPE

COMPONENT NAME

ANSIBLE VERSION

COLLECTION VERSION

CONFIGURATION

OS / ENVIRONMENT

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS