Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (cluster): [day2-ops] node update configuration #403

Merged
merged 14 commits into from
Mar 21, 2024

Conversation

ferantivero
Copy link
Contributor

@ferantivero ferantivero commented Feb 22, 2024

  • remove Kubernetes Reboot Daemon (Kured)
  • enable node update channel for K8s version automatic upgrades (node-image)
  • enable node os level update channel for OS security NodeImage automatic upgrades
  • add maintenance windows for k8s version and os level upgrades
  • add some initial guidance to the docs

Tested end to end:
image

# auto-upgrade is configured for node-only updates (procedure: node image updates)
# manual-upgrade is configured for aks cluster version updates
az aks show -n $AKS_CLUSTER_NAME -g rg-bu0001a0008 --query "autoUpgradeProfile"

image

# maintenance window for this RI is once a week
az aks maintenanceconfiguration list --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008

image

# kured is no longer installed.  
kubectl get all -n cluster-baseline-settings
image

@ferantivero ferantivero force-pushed the feature/192733_day2-node-updates-guidance branch 3 times, most recently from 094ae37 to 2218f2e Compare March 7, 2024 17:58
@ferantivero ferantivero requested a review from skabou March 7, 2024 18:00
01-prerequisites.md Outdated Show resolved Hide resolved
01-prerequisites.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
cluster-stamp.bicep Outdated Show resolved Hide resolved
cluster-stamp.bicep Outdated Show resolved Hide resolved
@skabou
Copy link
Contributor

skabou commented Mar 8, 2024

Some explanation: I could not get it to deploy as-is and had to make those two changes in cluster-stamp.bicep

Apparently there's a known bug with the configuration of "SecurityPatch & node-image" so we will need to go with "NodeImage & node-image"

@skabou
Copy link
Contributor

skabou commented Mar 11, 2024

Here's some context on the bug:
https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-os-image#node-channel-known-bugs

"Currently, when you set the cluster auto-upgrade channel to node-image, it also automatically sets the node OS auto-upgrade channel to NodeImage. You can't change node OS auto-upgrade channel value if your cluster auto-upgrade channel is node-image. In order to set the node OS auto-upgrade channel value, check the cluster auto-upgrade channel value isn't node-image."

I know a handful of other changes were made to support the preview feature SecurityPatch but those may not be necessary now.

@ferantivero ferantivero force-pushed the feature/192733_day2-node-updates-guidance branch from 79940cb to 57b25b1 Compare March 11, 2024 14:49
@skabou
Copy link
Contributor

skabou commented Mar 11, 2024

@ferantivero Looking good! Can you add some guidance / an example of a maintenance window for the updates? Thanks

For reference:
https://learn.microsoft.com/en-us/azure/architecture/operator-guides/aks/aks-upgrade-practices#automatic-node-image-upgrades
https://learn.microsoft.com/en-us/azure/aks/planned-maintenance#creating-a-maintenance-window

@ferantivero ferantivero force-pushed the feature/192733_day2-node-updates-guidance branch 3 times, most recently from f2f0a97 to 5a4b3d5 Compare March 11, 2024 18:31
@ferantivero
Copy link
Contributor Author

@ferantivero Looking good! Can you add some guidance / an example of a maintenance window for the updates? Thanks

For reference: https://learn.microsoft.com/en-us/azure/architecture/operator-guides/aks/aks-upgrade-practices#automatic-node-image-upgrades https://learn.microsoft.com/en-us/azure/aks/planned-maintenance#creating-a-maintenance-window

sure thing @skabou, we added both k8s and os level maint config windows.

done | from 1c82e86

@ferantivero ferantivero marked this pull request as ready for review March 11, 2024 18:53
@ferantivero ferantivero requested a review from skabou March 11, 2024 18:53
07-bootstrap-validation.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
Copy link
Contributor

@skabou skabou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some text suggestions

@ferantivero
Copy link
Contributor Author

Approved with some text suggestions

really appreciate all contribs @skabou, accepted them all.

@skabou
Copy link
Contributor

skabou commented Mar 12, 2024

@ferantivero Really appreciate your work on this!

ferantivero and others added 8 commits March 20, 2024 13:34
Co-authored-by: Jason Bouska <82831332+skabou@users.noreply.github.com>
Co-authored-by: Jason Bouska <82831332+skabou@users.noreply.github.com>
…de OS Level automatically"

This reverts commit 2218f2e.

this is based on a Node channel known bug:

Currently, when you set the cluster auto-upgrade channel to node-image,
it also automatically sets the node OS auto-upgrade channel to
NodeImage. You can't change node OS auto-upgrade channel value if your
cluster auto-upgrade channel is node-image.
Co-authored-by: Jason Bouska <82831332+skabou@users.noreply.github.com>
@ferantivero ferantivero force-pushed the feature/192733_day2-node-updates-guidance branch from 9754715 to e99adff Compare March 20, 2024 16:35
07-bootstrap-validation.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
07-bootstrap-validation.md Outdated Show resolved Hide resolved
@skabou
Copy link
Contributor

skabou commented Mar 20, 2024

👍

ferantivero and others added 2 commits March 20, 2024 16:27
Co-authored-by: Jason Bouska <82831332+skabou@users.noreply.github.com>
@ferantivero ferantivero merged commit f278b71 into main Mar 21, 2024
1 check passed
@ferantivero ferantivero deleted the feature/192733_day2-node-updates-guidance branch March 21, 2024 14:46
@skabou
Copy link
Contributor

skabou commented Mar 21, 2024

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants