Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop mechanism to apply a new image for Day 2 upgrades #621

Open
eak13 opened this issue Aug 27, 2021 · 5 comments
Open

Develop mechanism to apply a new image for Day 2 upgrades #621

eak13 opened this issue Aug 27, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@eak13
Copy link

eak13 commented Aug 27, 2021

Problem description
As a follow on to the investigative work done in #603, #604, #605 & #606, develop a process to apply a new image to an existing node to allow for upgrading/patching containerd, Dell i40e drivers, OS/Kernel patches & Ubuntu.

There is a requirement for CAPI to be upgraded to v1alpaha4 & CAPM3 to be upgraded to v1alpha5 via issue #518

Proposed change
Reference: https://hackmd.io/@Pallav/BkU2FuWZY

  1. Generate a new qcow image bundle with the upgrades/patches via Image Builder. Ensure the version catalog for your site is updated to reference the latest image. < separated into a separate issue under Day 2 Operations - Generate new QCOW bundle with upgrades #622
  2. Deploy the Kube API Server in HA mode (Kube API Server VIP) - this should be in place but doesn't seem to be working. Possibly correct as part of this issue or break out as a separate bug. < created separate issue under Kube API server not deploying in HA mode treasuremap#200
  3. Perform a rolling upgrade of all control plane and worker nodes with new image. Need to verify how this can be done both when there is a cold standby spare control plane node that can be used in the rolling upgrade and in a scenario where there is no available standby node.
@eak13 eak13 added enhancement New feature or request triage Needs evaluation by project members labels Aug 27, 2021
@eak13 eak13 added the epic Features or large improvements reflected as a list of issues label Aug 30, 2021
@eak13 eak13 added this to the v2.2 milestone Aug 30, 2021
@jezogwza jezogwza removed the triage Needs evaluation by project members label Sep 1, 2021
@sshiba
Copy link

sshiba commented Sep 21, 2021

@eak13 , I have self-assigned https://itrack.web.att.com/browse/CPVYGR-687, which references this issue. Also notices that you and Larry will cover this subject in the design call scheduled for 09/23.

Should I leave CPVYGR-687 alone and focus on something else until your design topic is covered?

@lb4368 lb4368 removed the epic Features or large improvements reflected as a list of issues label Sep 28, 2021
@Arvinderpal
Copy link
Contributor

On the topic of control-plane node upgrades, KCP recently added support for scale-in upgrades -- basically to handle the situation where there is no standby node available.

This is not documented to well. For example, CAPI Book does not mention this. However, as noted in this design doc on KCP, you can set MaxSurge=0 in KCP to accomplish this. In this scenario, KCP will first take down a control-plane node before trying to provision a new control-plane node. Note that is only applicable in a 3+ control-plane node cluster.

Also, on a related note, CAPM3 recently added the ability to reuse a node during an upgrade. This is useful for keeping the control-plane nodes on the same host during an upgrade.

@sshiba
Copy link

sshiba commented Nov 19, 2021

Here is the url for WIP documentation: https://hackmd.io/qSLa-qb5SBOuz5I-d6I1Pg.
Note that this document was not read proof but it is an initial draft.

@sshiba
Copy link

sshiba commented Nov 19, 2021

Also documented how to setup the testing environment with Metal3-dev-env here.

@sshiba
Copy link

sshiba commented Dec 1, 2021

@eak13 , the hackmd document describing the upgrading process has been completed and ready for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants