Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add Node managed labels support #7173

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

enxebre
Copy link
Member

@enxebre enxebre commented Sep 6, 2022

What this PR does / why we need it:
Implements part of #6255. Propagate managed labels from Machines to Nodes.

/hold

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 6, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 6, 2022
@sbueringer
Copy link
Member

cc @fabriziopandini

@fabriziopandini
Copy link
Member

I will take a look at this ASAP but I'm getting to the idea that one way to get this moving is to start acting on machine -> node propagation, which seems already to get consensus and then tackling md changes in a separated PR.
cc @vincepri

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre thanks for keeping this effort moving, as said in the office hour meeting, I'm planning to invest more time helping on this topic

api/v1beta1/machine_types.go Outdated Show resolved Hide resolved
api/v1beta1/machine_types.go Outdated Show resolved Hide resolved
internal/controllers/machine/machine_controller_noderef.go Outdated Show resolved Hide resolved
@kfox1111
Copy link

kfox1111 commented Sep 8, 2022

Does this cover taints as well?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2022
@enxebre enxebre changed the title Add .nodeLabels support ✨ Add Node managed labels support Sep 20, 2022
@enxebre
Copy link
Member Author

enxebre commented Sep 20, 2022

Updated to scope the PR to propagate from Machines to Nodes. cc @fabriziopandini

@enxebre enxebre force-pushed the node-labels branch 2 times, most recently from e291634 to b3b9c83 Compare September 20, 2022 09:32
@enxebre
Copy link
Member Author

enxebre commented Sep 22, 2022

/test pull-cluster-api-e2e-main

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre thanks for reducing the scope of this PR.

I think the only point to be addressed is about the label prefixes to be reconciled once.

The proposal states:

One-time Apply of Standard Kubernetes Labels

The machine controller will apply the standard kubernetes labels, if specified, to the Node immediately after the Node enters the Ready state (ProviderID is set). We will enforce this one-time application by making the labels immutable via a validating webhook.

WRT to the first part, I think that we should revert the order (first set labels if not already there, then set the provider ID) so the operation is re-entrant.

WRT to making those labels immutable via webhooks, I'm personally leaning toward not implementing it, because I feel we should leave admin the freedom to manage what we do not reconcile, but this is up for discussion.

I was also thinking that we can eventually also change CABPK so the labels are applied by kubeadm on node creation (and give a recommendation to all the bootstrap provider to do the same), but this could be also a follow up

api/v1beta1/machine_types.go Outdated Show resolved Hide resolved
internal/controllers/machine/machine_controller_noderef.go Outdated Show resolved Hide resolved
@enxebre
Copy link
Member Author

enxebre commented Sep 23, 2022

WRT to the first part, I think that we should revert the order (first set labels if not already there, then set the provider ID) so the operation is re-entrant.

So as the PR it's right now labels are applied along with annotations as soon as the matching Node is found. Does this sounds good? If so I'll update the proposal.

WRT to making those labels immutable via webhooks, I'm personally leaning toward not implementing it, because I feel we should leave admin the freedom to manage what we do not reconcile, but this is up for discussion.

I think I agree.

f I got it right with this approach we are continuously reconciling all the managed prefixes, but according to the proposal only one of them should be reconciled and the other two should be applied once.
Q: Should we add a note that those labels are reconciled only on node creation? (same for node-restriction)
@fabriziopandini this contradicts the previous point.

As it's right now all managed labels would be continuously reconciled from Machines to Nodes. I think that's ok particularly from a UX pov.
As it's right now when propagating this change via MachineDeployment this will trigger a rolling upgrade effectively making labels to only be applied on creation.

If the above makes sense I'll update the proposal to reflect it back.

@sbueringer
Copy link
Member

sbueringer commented Sep 23, 2022

WRT to making those labels immutable via webhooks, I'm personally leaning toward not implementing it, because I feel we should leave admin the freedom to manage what we do not reconcile, but this is up for discussion.

The reason that was introduced in the proposal is just that it's (after inline-propagation is implemented) a bit awkward if you change one of those labels and they are never rolled out at all (except if you trigger the rollout e.g. with rolloutAfter which is not implemented as of today, not sure if there any alternatives to trigger that going forward).

But making it immutable via a webhook is also not really a solution for that. So probably comes down to documenting this and getting rolloutAfter implemented in the fullness of time.

As it's right now all managed labels would be continuously reconciled from Machines to Nodes. I think that's ok particularly from a UX pov.

I'm not sure I follow. Are we talking about continuously reconciling node.cluster.x-k8s.io/* + node-role.kubernetes.io/* + node-restriction.kubernetes.io/*?

If yes, how do we resolve this issue surfaced in the proposal?

It's important to note that the these specific labels are not kept in sync, but rather applied only once after the Machine and its corresponding Node are first created. Taking authoritative ownership of these prefixes would preclude other entities from adding and removing such labels on the Node.

Or is the idea that for now it's okay as it's effectively only done on create because we don't have in-place propagation and when we implement inline-propagation we will adjust the Machine=>Node sync accordingly?

@enxebre
Copy link
Member Author

enxebre commented Sep 23, 2022

Or is the idea that for now it's okay as it's effectively only done on create because we don't have in-place propagation and when we implement inline-propagation we will adjust the Machine=>Node sync accordingly?

I was suggesting this and deviate from current proposal a bit and reconcile them all anytime in the lifecycle of the machine, If you don't want CAPI to take authoritative ownership of these prefixes, then just don't set the labels.
But happy to update the PR to something different.

@enxebre enxebre force-pushed the node-labels branch 2 times, most recently from 5cc2e73 to 83633c3 Compare September 27, 2022 11:54
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a quick look

@enxebre
Copy link
Member Author

enxebre commented Jan 11, 2023

updated, tests currently failing because of kubernetes/client-go#992

@sbueringer
Copy link
Member

sbueringer commented Jan 11, 2023

updated, tests currently failing because of kubernetes/client-go#992

That's fun. I would assume that the client-go issue isn't resolved quick enough that we wouldn't be significantly blocked for this feature.

Given that, I would suggest we look for the best way to "fixup" the tests in a pragmatic way.

Just a first suggestion, what if we add a flag to the reconciler "disableNodeLabelSync" that we only set for our tests (+ corresponding godoc with link to the client-go issue of course)

(Otherwise lgtm +/- golangci-lint)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2023
@enxebre
Copy link
Member Author

enxebre commented Jan 11, 2023

Just a first suggestion, what if we add a flag to the reconciler "disableNodeLabelSync" that we only set for our tests (+ corresponding godoc with link to the client-go issue of course)

Makes sense, updated.

I actually think of this as a feature where we give users more control over when to update labels in a node pool by explicitly signalling it.

@enxebre enxebre force-pushed the node-labels branch 2 times, most recently from e48114e to 9a4ffef Compare January 11, 2023 20:44
@sbueringer
Copy link
Member

@enxebre looks like the linter has some findings

@enxebre
Copy link
Member Author

enxebre commented Jan 12, 2023

@enxebre looks like the linter has some findings

Thanks! fixed.

Given the current PR implementation via SSA, I think we might want to re-consider again extending the functionality to enable arbitrary labels as API input. I think it should just work with SSA and it would enable easier adoption for use cases where arbitrary labels are dictating workloads topology distribution/affinity via gitops or so. That could be an additive follow up if we wanted to thought. Thoughts?

@sbueringer
Copy link
Member

Given the current PR implementation via SSA, I think we might want to re-consider again extending the functionality to enable arbitrary labels as API input. I think it should just work with SSA and it would enable easier adoption for use cases where arbitrary labels are dictating workloads topology distribution/affinity via gitops or so. That could be an additive follow up if we wanted to thought. Thoughts?

I'm not sure if we would want to take all Machine labels and then propagate them through to the Node. Right now it's relatively obvious which ones are also going to the Nodes. I think if we want to consider propagating aribtrary labels through I would prefer doing this via a separate "node labels" field of some sort.

I'm open to discussing this in a follow-up issue, but I would keep the scope of the current PR to what we decided in the proposal.
(as I think your intention was as well)

@sbueringer
Copy link
Member

Thx!
/lgtm
from my side.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f824a33f165ff6e31bae16ac2f7bc05da6587ec0

@enxebre
Copy link
Member Author

enxebre commented Jan 12, 2023

labels through I would prefer doing this via a separate "node labels" field of some sort.

Right, I was referring to reconsider having a dedicated field for label sync in Machines given that with field input + SSA we cover current PR use case but also enable any other. Yeh let's leave that discussion aside from this PR 👍🏾.

@sbueringer
Copy link
Member

/assign @fabriziopandini @ykakarap

PTAL. I think this is pretty close to merge-ready.

@ykakarap
Copy link
Contributor

/lgtm

Looks good.

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great works! thanks @enxebre and @ykakarap for pushing this after the finish line!
/lgtm
/approve

@@ -91,6 +91,7 @@ type Reconciler struct {
// nodeDeletionRetryTimeout determines how long the controller will retry deleting a node
// during a single reconciliation.
nodeDeletionRetryTimeout time.Duration
disableNodeLabelSync bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up let's open an issue to track the migration of tests to envclient and the removal of this flag (another option might be to split up the func and then unit test the individual parts vs testing the entire controller).

In the meantime, we can use a follow-up PR to add a comment on this field and make it explicit that it is only for allowing partial testing of the controller logic using the fake client (because features based on SSA cannot be tested with it).

cc @ykakarap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up PR to add the comment: #7965

Issue to migrate tests to envtest and drop the flag: #7964 (also added to the tracking issue).

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2023
@k8s-ci-robot k8s-ci-robot merged commit 56aa8be into kubernetes-sigs:main Jan 20, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants