Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm upgrade fails because of virtual nodes #123

Open
dimm0 opened this issue Sep 20, 2021 · 5 comments
Open

kubeadm upgrade fails because of virtual nodes #123

dimm0 opened this issue Sep 20, 2021 · 5 comments
Labels
bug Something isn't working documentation

Comments

@dimm0
Copy link
Collaborator

dimm0 commented Sep 20, 2021

  • Nodes have no version, which confuses the upgrade script
[upgrade/version] FATAL: the --version argument is invalid due to these errors:

	- couldn't parse kubelet version

Can be bypassed if you pass the --force flag
To see the stack trace of this error execute with --v=5 or higher
  • Nodes go on and off, which makes install script unhappy ("control plane is not ready")
  • In the end upgrade script fails:
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade/postupgrade] Applying label node-role.kubernetes.io/control-plane='' to Nodes with label node-role.kubernetes.io/master='' (deprecated)
timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

To finish the upgrades I had to delete all virtual nodes and disable admiralty.

@adrienjt
Copy link
Contributor

  1. This looks like kubeadm upgrade, is that right? First of all, is there a way to exclude nodes from the upgrade process? It doesn't make sense to upgrade virtual nodes.

  2. Nodes have no version

    Virtual nodes have no status.nodeInfo.kubeletVersion because they could be backed by clusters running multiple versions (e.g., being upgraded).

  3. Nodes go on and off

    What do you mean?

  4. In the end upgrade script fails

    I wonder what condition timed out waiting for the condition the script refers to. Looking at the snippet, this might be because node-role.kubernetes.io/master= is aggregated on virtual nodes, and the node-role.kubernetes.io/control-plane= label is removed by Admiralty as soon as it's applied. Could be fixed with new target option spec.excludedLabelsRegexp: ^node-role\.kubernetes\.io/master=$.

@adrienjt adrienjt added the bug Something isn't working label Sep 20, 2021
@adrienjt adrienjt changed the title Federated virtual nodes are preventing cluster upgrades kubeadm upgrade fails because of virtual nodes Sep 20, 2021
@dimm0
Copy link
Collaborator Author

dimm0 commented Sep 20, 2021

  1. This looks like kubeadm upgrade, is that right? First of all, is there a way to exclude nodes from the upgrade process? It doesn't make sense to upgrade virtual nodes.

Not that I know of...

  1. Nodes go on and off

This is from my previous attempts to upgrade. I think it was hung when nodes were not ready.

  1. In the end upgrade script fails

    I wonder what condition timed out waiting for the condition the script refers to. Looking at the snippet, this might be because node-role.kubernetes.io/master= is aggregated on virtual nodes, and the node-role.kubernetes.io/control-plane= label is removed by Admiralty as soon as it's applied. Could be fixed with new target option spec.excludedLabelsRegexp: ^node-role\.kubernetes\.io/master=$.

I'd think that's the reason.

Anyway, I think it's worth adding to docs some recommendations on how to upgrade the cluster (and at least test this)

@adrienjt
Copy link
Contributor

I'd love your help with this:

  • test kubeadm upgrade after configuring targets with spec.excludedLabelsRegexp: ^node-role\.kubernetes\.io/master=$
  • make sure virtual nodes remain ready during kubeadm upgrade
  • add documentation page in operator guide about running kubeadm upgrade in source/management cluster
  • consider excluding node-role.kubernetes.io/master= and node-role.kubernetes.io/control-plane= from virtual node label aggregation by default
  • add e2e test with kubeadm

@dimm0
Copy link
Collaborator Author

dimm0 commented Dec 16, 2021

I'm about to do another upgrade.
Tried adding the spec.excludedLabelsRegexp param to 3 of my targets, and 1 of them keep respawning with master label. Other 2 are fine. Any tips?

In that one a federated pod is running

@dimm0
Copy link
Collaborator Author

dimm0 commented Dec 16, 2021

make sure virtual nodes remain ready during kubeadm upgrade

The control plane goes offline during the upgrade, so can't really do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation
Projects
None yet
Development

No branches or pull requests

2 participants