Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

Closed
OttaviaB opened this issue Jul 31, 2024 · 0 comments · Fixed by #770
Closed

k8s_drain not waiting for pod deletion if there is only one pod to evict #769

OttaviaB opened this issue Jul 31, 2024 · 0 comments · Fixed by #770

Comments

@OttaviaB
Copy link
Contributor

SUMMARY

In the module k8s_drain the result of the draining process is never checked if there is only one pod to evict from the node.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

k8s_drain

ANSIBLE VERSION
ansible [core 2.15.12]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/venvs/ansible/lib64/python3.9/site-packages/ansible
  ansible collection location = ~/.ansible/collections:/usr/share/ansible/collections
  executable location = ~/venvs/ansible/bin/ansible
  python version = 3.9.19 (main, Mar 22 2024, 21:01:47) [GCC] (~/venvs/ansible/bin/python3.9)
  jinja version = 3.1.4
  libyaml = True
COLLECTION VERSION
# ~/.ansible/collections/ansible_collections
Collection                    Version
----------------------------- -------
kubernetes.core               5.0.0
CONFIGURATION
CALLBACKS_ENABLED(~/.ansible.cfg) = ['ansible.posix.profile_tasks']
CONFIG_FILE() = ~/.ansible.cfg
PAGER(env: PAGER) = less
OS / ENVIRONMENT
  • Ansible controller: SLES 15-SP5
  • Remote hosts: SLES 15-SP5
  • RKE2 Version: v1.28.10+rke2r1
STEPS TO REPRODUCE
  • deploy a single pod running only on one node
    • Problem is to be observed best with a pod that takes a long time to terminate.
    • e. g. a deployment like this:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: draintest
  namespace: kube-public
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: draintaints
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: draintaints
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - preference:
                matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                      - <node_name>
              weight: 100
      containers:
        - command:
            - bash
            - '-c'
            - |
              trap 'echo -e "\nShutting down..."; sleep 60; exit' SIGTERM
              while true; do 
                  echo "Hi, I'm Steve!";
                  sleep 1
              done
          image: <registry>/shell:v2.0
          imagePullPolicy: IfNotPresent
          name: main
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsGroup: 12345
            runAsNonRoot: true
            runAsUser: 12345
            seccompProfile:
              type: RuntimeDefault
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 200
  • drain this node
---
- hosts: "{{ target }}"
  serial: 1
  gather_facts: false
  vars:
    kubeconfig_dir: "~/.kubeconfigs"
  tasks:
    - name: Drain node
      kubernetes.core.k8s_drain:
        kubeconfig: "{{ kubeconfig_path }}"
        name: "{{ inventory_hostname }}"
        delete_options:
          ignore_daemonsets: true
          delete_emptydir_data: true
          wait_timeout: 100
          terminate_grace_period: 30
          force: true
      delegate_to: localhost
EXPECTED RESULTS

k8s_drain should wait until the pod is evicted successfully or run in a timeout.

ACTUAL RESULTS

k8s_drain reports immediately status "changed" no matter if the pod was evicted or not.

PLAY [Test] ****************************************************************************************************************************************************************************

TASK [Drain node] **************************************************************************************************************************************************************************
Montag 15 Juli 2024  15:48:57 +0200 (0:00:00.047)       0:00:00.047 *********** 
[WARNING]: cannot delete mirror Pods using API server: kube-system/kube-proxy-<node_name>.
changed: [<node_name> -> localhost]

PLAY RECAP *********************************************************************************************************************************************************************************
<node_name>             : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Montag 15 Juli 2024  15:48:58 +0200 (0:00:01.202)       0:00:01.249 *********** 
=============================================================================== 
Drain node -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1.20s

Even though the pod needs 60 seconds to terminate, k8s_drain reports deletion already after 2 seconds while the pod is still running.

patchback bot pushed a commit that referenced this issue Dec 11, 2024
SUMMARY
Fixes #769 .
k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained.
The list of pods, pods, was being "popped" before the first iteration of the while loop:
        pod = pods.pop()
        while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods:
When pods contains only one element, the while loop is skipped.

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

k8s_drain

Reviewed-by: Mike Graves <mgraves@redhat.com>
(cherry picked from commit 4c305e7)
patchback bot pushed a commit that referenced this issue Dec 11, 2024
SUMMARY
Fixes #769 .
k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained.
The list of pods, pods, was being "popped" before the first iteration of the while loop:
        pod = pods.pop()
        while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods:
When pods contains only one element, the while loop is skipped.

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

k8s_drain

Reviewed-by: Mike Graves <mgraves@redhat.com>
(cherry picked from commit 4c305e7)
softwarefactory-project-zuul bot pushed a commit that referenced this issue Dec 11, 2024
This is a backport of PR #770 as merged into main (4c305e7).
SUMMARY
Fixes #769 .
k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained.
The list of pods, pods, was being "popped" before the first iteration of the while loop:
        pod = pods.pop()
        while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods:
When pods contains only one element, the while loop is skipped.


ISSUE TYPE


Bugfix Pull Request

COMPONENT NAME

k8s_drain
softwarefactory-project-zuul bot pushed a commit that referenced this issue Dec 11, 2024
This is a backport of PR #770 as merged into main (4c305e7).
SUMMARY
Fixes #769 .
k8s_drain was not checking if a pod has been deleted when there was only one pod on the node to be drained.
The list of pods, pods, was being "popped" before the first iteration of the while loop:
        pod = pods.pop()
        while (_elapsed_time() < wait_timeout or wait_timeout == 0) and pods:
When pods contains only one element, the while loop is skipped.


ISSUE TYPE


Bugfix Pull Request

COMPONENT NAME

k8s_drain
softwarefactory-project-zuul bot pushed a commit that referenced this issue Jan 20, 2025
SUMMARY
Version 3.3.0 of ansible-collection kubernetes.core came with several improvements and bugfixes
ISSUE TYPE

New release pull request

Changelog
Minor Changes

k8s_drain - Improve error message for pod disruption budget when draining a node (#797).

Bugfixes

helm - Helm version checks did not support RC versions. They now accept any version tags. (#745).
helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (#796).
k8s_drain - Fix k8s_drain does not wait for single pod (#769).
k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set  (#792).
kubeconfig option should not appear in module invocation log (#782).
kustomize - kustomize plugin fails with deprecation warnings (#639).
waiter - Fix waiting for daemonset when desired number of pods is 0. (#756).

ADDITIONAL INFORMATION
Collection kubernets.core version 3.3.0 is compatible with ansible-core>=2.14.0

Reviewed-by: Alina Buzachis
Reviewed-by: Yuriy Novostavskiy
Reviewed-by: Mike Graves <mgraves@redhat.com>
This was referenced Jan 20, 2025
softwarefactory-project-zuul bot pushed a commit that referenced this issue Jan 20, 2025
SUMMARY
This release came with new module helm_registry_auth, and improvements to the error messages in the k8s_drain module, new parameter insecure_registry for helm_template module and several bug fixes.
ISSUE TYPE

New release pull request

Changelog
Minor Changes

Bump version of ansible-lint to minimum 24.7.0 (#765).
Parameter insecure_registry added to helm_template as equivalent of insecure-skip-tls-verify (#805).
connection/kubectl.py - Added an example of using the kubectl connection plugin to the documentation (#741).
k8s_drain - Improve error message for pod disruption budget when draining a node (#797).

Bugfixes

helm - Helm version checks did not support RC versions. They now accept any version tags. (#745).
helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (#796).
k8s_drain - Fix k8s_drain does not wait for single pod (#769).
k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set  (#792).
kubeconfig option should not appear in module invocation log (#782).
kustomize - kustomize plugin fails with deprecation warnings (#639).
waiter - Fix waiting for daemonset when desired number of pods is 0. (#756).

New Modules

helm_registry_auth - Helm registry authentication module

ADDITIONAL INFORMATION
Collection kubernets.core version 3.1.0 is compatible with ansible-core>=2.15.0

Reviewed-by: Mike Graves <mgraves@redhat.com>
yurnov added a commit to yurnov/kubernetes.core that referenced this issue Jan 20, 2025
SUMMARY
This release came with new module helm_registry_auth, and improvements to the error messages in the k8s_drain module, new parameter insecure_registry for helm_template module and several bug fixes.
ISSUE TYPE

New release pull request

Changelog
Minor Changes

Bump version of ansible-lint to minimum 24.7.0 (ansible-collections#765).
Parameter insecure_registry added to helm_template as equivalent of insecure-skip-tls-verify (ansible-collections#805).
connection/kubectl.py - Added an example of using the kubectl connection plugin to the documentation (ansible-collections#741).
k8s_drain - Improve error message for pod disruption budget when draining a node (ansible-collections#797).

Bugfixes

helm - Helm version checks did not support RC versions. They now accept any version tags. (ansible-collections#745).
helm_pull - Apply no_log=True to pass_credentials to silence false positive warning.. (ansible-collections#796).
k8s_drain - Fix k8s_drain does not wait for single pod (ansible-collections#769).
k8s_drain - Fix k8s_drain runs into a timeout when evicting a pod which is part of a stateful set  (ansible-collections#792).
kubeconfig option should not appear in module invocation log (ansible-collections#782).
kustomize - kustomize plugin fails with deprecation warnings (ansible-collections#639).
waiter - Fix waiting for daemonset when desired number of pods is 0. (ansible-collections#756).

New Modules

helm_registry_auth - Helm registry authentication module

ADDITIONAL INFORMATION
Collection kubernets.core version 3.1.0 is compatible with ansible-core>=2.15.0

Reviewed-by: Mike Graves <mgraves@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant