Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeadmControlPlane should provide an explanation when it decides a Machine should be replaced #5557

Closed
dlipovetsky opened this issue Nov 2, 2021 · 13 comments · Fixed by #8959
Assignees
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@dlipovetsky
Copy link
Contributor

User Story

As a user I would like to understand why KubeadmControlPlane (KCP) decides that a Machine needs to be replaced (in KCP parlance, the Machine "needs rollout").

Today, I see only that KCP decides the Machine needs to be replaced, but no reason as to why:

I1029 19:36:57.752152       1 controller.go:241] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" 
I1029 19:36:58.673234       1 controller.go:328] controller/kubeadmcontrolplane "msg"="Rolling out Control Plane machines" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" "needRollout"=["dlipovetsky-adopt-48fd-control-plane-0"]

This is, of course, not enough information to understand the decision.

Detailed Description

Anything else you would like to add:

KCP makes the decision by applying filter functions. These functions return a boolean. I'm working on a proof of concept change to the filter API so that the functions return a explanation along with a boolean.

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 2, 2021
@randomvariable
Copy link
Member

/area control-plane

@k8s-ci-robot k8s-ci-robot added the area/control-plane Issues or PRs related to control-plane lifecycle management label Nov 2, 2021
@vincepri
Copy link
Member

vincepri commented Nov 2, 2021

/help
/milestone v1.1

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/milestone v1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v1.1 milestone Nov 2, 2021
@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 2, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2022
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 31, 2022
@fabriziopandini fabriziopandini modified the milestones: v1.1, v1.2 Feb 3, 2022
@RaghavRoy145
Copy link

I found this interesting and was wondering if this is an issue that I could take up to start contributing to CAPI, I'm ramping up on it, but if this isn't a beginner friendly task/is time critical, do let me know. Otherwise I'll go ahead and assign myself!

@fabriziopandini
Copy link
Member

@RaghavRoy145 unpacking KCP machine filters is not the easiest task to start with, but complexity is something really opinionated..

@RaghavRoy145
Copy link

Thanks @fabriziopandini , and you're right.

@RaghavRoy145
Copy link

/assign

@dlipovetsky
Copy link
Contributor Author

I started work on this in https://github.com/dlipovetsky/cluster-api/tree/dlipovetsky/verbose-filters-v1alpha4. I didn't have time to stick with it. I'd be happy to collaborate with you, if you'd like, @RaghavRoy145 .

@RaghavRoy145
Copy link

I would love to collaborate, I'm pretty sure I'll need all the help I can get! 😄

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/triage accepted
/unassign @RaghavRoy145

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 3, 2022
@sbueringer
Copy link
Member

sbueringer commented Mar 15, 2023

/assign

I have something locally which comes close. I'll create a PR when I get to it (might not be very soon)

Note to myself: first very hacky version with a bunch of other stuff mixed in can be found here: https://github.com/sbueringer/cluster-api/commits/pr-improve-kcp-logging

@sbueringer sbueringer removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants