bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

jakuzure · 2022-08-30T08:35:21Z

Summary

I upgraded rke2 from v1.22.9 to v1.23.9 which actually worked fine, but I noticed that some worker nodes were upgraded in between the master nodes which goes against RKE2 recommendations:

Note: Upgrade the server nodes first, one at a time. Once all servers have been upgraded, you may then upgrade agent nodes.

see https://docs.rke2.io/upgrade/basic_upgrade/

Ansible Output:

TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-0] ***
skipping: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-0] ***
changed: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
FAILED - RETRYING: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left).
ok: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-0] ***
skipping: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-1] ***
skipping: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-1] ***
changed: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
ok: [platform-rancher-master-k8s-master-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-1] ***
skipping: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-1] ***
skipping: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-1] ***
changed: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
FAILED - RETRYING: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left).
ok: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-worker-1] ***
skipping: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-2] ***
skipping: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-2] ***
changed: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
ok: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-2] ***
skipping: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-0] ***
skipping: [platform-rancher-master-k8s-worker-0]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-0] ***

Issue Type

Bug Report

Ansible Version

ansible [core 2.12.7]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.10/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.5 (main, Jul 13 2022, 05:45:22) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True

Steps to Reproduce

trigger a RKE2 upgrade, i.e. from 1.22.9 to 1.23.9

Expected Results

Master nodes should be upgraded first, then the worker nodes

Actual Results

Nodes are upgraded seemingly randomly

The text was updated successfully, but these errors were encountered:

jakuzure added the bug Something isn't working label Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

jakuzure commented Aug 30, 2022 •

edited

Loading

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

Comments

jakuzure commented Aug 30, 2022 • edited Loading

Summary

Issue Type

Ansible Version

Steps to Reproduce

Expected Results

Actual Results

jakuzure commented Aug 30, 2022 •

edited

Loading