Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: rke2 upgrade, agent nodes should be upgraded after all the master nodes #102

Open
jakuzure opened this issue Aug 30, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@jakuzure
Copy link
Contributor

jakuzure commented Aug 30, 2022

Summary

I upgraded rke2 from v1.22.9 to v1.23.9 which actually worked fine, but I noticed that some worker nodes were upgraded in between the master nodes which goes against RKE2 recommendations:

Note: Upgrade the server nodes first, one at a time. Once all servers have been upgraded, you may then upgrade agent nodes.

see https://docs.rke2.io/upgrade/basic_upgrade/

Ansible Output:

TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-0] ***
skipping: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-0] ***
changed: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
FAILED - RETRYING: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left).
ok: [platform-rancher-master-k8s-master-0 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-0] ***
skipping: [platform-rancher-master-k8s-master-0]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-1] ***
skipping: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-1] ***
changed: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
ok: [platform-rancher-master-k8s-master-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-1] ***
skipping: [platform-rancher-master-k8s-master-1]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-1] ***
skipping: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-1] ***
changed: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
FAILED - RETRYING: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2]: Wait for all nodes to be ready again (100 retries left).
ok: [platform-rancher-master-k8s-worker-1 -> platform-rancher-master-k8s-master-2(10.10.50.103)]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-worker-1] ***
skipping: [platform-rancher-master-k8s-worker-1]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-master-2] ***
skipping: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-master-2] ***
changed: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Wait for all nodes to be ready again] *********************
ok: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Uncordon the node platform-rancher-master-k8s-master-2] ***
skipping: [platform-rancher-master-k8s-master-2]
TASK [lablabs.rke2 : Cordon and Drain the node platform-rancher-master-k8s-worker-0] ***
skipping: [platform-rancher-master-k8s-worker-0]
TASK [lablabs.rke2 : Restart RKE2 service on platform-rancher-master-k8s-worker-0] ***

Issue Type

Bug Report

Ansible Version

ansible [core 2.12.7]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.10/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.5 (main, Jul 13 2022, 05:45:22) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True

Steps to Reproduce

trigger a RKE2 upgrade, i.e. from 1.22.9 to 1.23.9

Expected Results

Master nodes should be upgraded first, then the worker nodes

Actual Results

Nodes are upgraded seemingly randomly
@jakuzure jakuzure added the bug Something isn't working label Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant