Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Adds a new controller to synchronize labels between BMHs and K Nodes #152

Merged
merged 2 commits into from
Apr 8, 2021

Conversation

Arvinderpal
Copy link
Contributor

@Arvinderpal Arvinderpal commented Jan 11, 2021

Introduces a new controller in CAPM3 to synchronize a specific set of labels placed on a BMH with those on the corresponding Kubernetes Node running on that BMH.

Design Proposal: metal3-io/metal3-docs#149

Example 1: Labels on BMH are added to the K Node

  1. Add prefix annotation to Metal3Cluster:
 kubectl annotate metal3cluster test1 metal3.io/metal3-label-sync-prefixes=my-prefix.metal3.io,foo.bar -n=metal3 --overwrite
  1. Add labels to BareMetalHost:
kubectl label baremetalhosts node-0 my-prefix.metal3.io/rack=xyz-123 -n=metal3
kubectl label baremetalhosts node-0 foo.bar/xyz=abc -n=metal3
  1. You should now see labels show up on the K Node:
kubectl get baremetalhosts --show-labels -n=metal3
kubectl get nodes --show-labels

Example 2: Label changes directly on K Node will be reconciled:

  1. Add label directly to node. The label will be removed at the next reconcile cycle:
kubectl label node test1-h4vjt my-prefix.metal3.io/color=blue
  1. Delete label from Node. The label will be re-added at next reconcile cycle:
kubectl label node test1-h4vjt my-prefix.metal3.io/rack-

@metal3-io-bot metal3-io-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2021
@metal3-io-bot
Copy link
Contributor

Hi @Arvinderpal. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maelk
Copy link
Member

maelk commented Jan 11, 2021

/ok-to-test
Can you please add a condition to not reconcile paused objects ?

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2021
@Arvinderpal
Copy link
Contributor Author

/ok-to-test
Can you please add a condition to not reconcile paused objects ?

@maelk I added the pause check for the BareMetalHost, Cluster and Metal3Cluster. Thanks.

@Arvinderpal Arvinderpal changed the title WIP: ✨Adds a new controller to synchronize labels between BMHs and K Nodes ✨Adds a new controller to synchronize labels between BMHs and K Nodes Jan 22, 2021
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2021
@Arvinderpal
Copy link
Contributor Author

@dhellmann @maelk @zaneb
This PR is ready for an initial review. PTAL.
I have also updated the description with how to try out this feature.

@metal3-io-bot metal3-io-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 25, 2021
prefix, between BareMetalHost and Kubernetes Nodes.
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 25, 2021
Comment on lines 64 to 70
// +kubebuilder:rbac:groups=metal3.io,resources=baremetalhosts,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=metal3.io,resources=baremetalhosts/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=infrastructure.cluster.x-k8s.io,resources=metal3machines,verbs=get;list;watch
// +kubebuilder:rbac:groups=infrastructure.cluster.x-k8s.io,resources=metal3machines/status,verbs=get
// +kubebuilder:rbac:groups=cluster.x-k8s.io,resources=machines;machines/status,verbs=get;list;watch
// +kubebuilder:rbac:groups="",resources=events,verbs=get;list;watch;create;update;patch
// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch;create;update;patch;delete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need all those permissions ? I would highly recommend to trim them down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I will only keep the read operations -- they are needed to fetch the kubeconfig of the target cluster.

}
return ctrl.Result{}, err
}
if host.Annotations != nil && host.Annotations[bmh.PausedAnnotation] == baremetal.PausedAnnotationKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check on the presence of the annotation, not on the value. There might be other values set by the user (or empty string) so we should not care for it when checking for pause (like in capi)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@Arvinderpal Arvinderpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelk Thanks for the review. Please take another look.

Comment on lines 64 to 70
// +kubebuilder:rbac:groups=metal3.io,resources=baremetalhosts,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=metal3.io,resources=baremetalhosts/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=infrastructure.cluster.x-k8s.io,resources=metal3machines,verbs=get;list;watch
// +kubebuilder:rbac:groups=infrastructure.cluster.x-k8s.io,resources=metal3machines/status,verbs=get
// +kubebuilder:rbac:groups=cluster.x-k8s.io,resources=machines;machines/status,verbs=get;list;watch
// +kubebuilder:rbac:groups="",resources=events,verbs=get;list;watch;create;update;patch
// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch;create;update;patch;delete
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I will only keep the read operations -- they are needed to fetch the kubeconfig of the target cluster.

}
return ctrl.Result{}, err
}
if host.Annotations != nil && host.Annotations[bmh.PausedAnnotation] == baremetal.PausedAnnotationKey {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Arvinderpal
Copy link
Contributor Author

@dhellmann @zaneb PTAL.
Much appreciated.

@digambar15
Copy link
Contributor

Hey, @Arvinderpal Can you fix unit test issues ? Otherwise this patch looks good.

@Arvinderpal
Copy link
Contributor Author

/retest

@Arvinderpal
Copy link
Contributor Author

@fmuyassarov PTAL at this PR.
Also, I can't seem to issue the /retest command. I wonder if you can allow that?

@furkatgofurov7
Copy link
Member

Seems you inserted the word inside the code block, and it did not work.

/retest

Copy link
Member

@fmuyassarov fmuyassarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some nits

controllers/metal3labelsync_controller.go Outdated Show resolved Hide resolved
controllers/metal3labelsync_controller.go Outdated Show resolved Hide resolved
controllers/metal3labelsync_controller.go Outdated Show resolved Hide resolved
Copy link
Member

@maelk maelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment, otherwise looks good to me

capm3Machine := &capm3.Metal3Machine{}
if err := r.Client.Get(context.TODO(), name, capm3Machine); err != nil {
log.Error(err, "failed to get Metal3Machine")
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should rather use "continue" here to not block the reconciliation on one machine. For example a user has scaled out a machine deployment but does not have enough BMH. If that machine is first in the list, it will always return on line 311. Using continue, would allow us to get past that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelk That's a good point. I had thought about this too and took the more conservative route. In addition to line 311, should we also use continue for 316 and 321?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think for all exit conditions, we should instead use "continue"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about line 306. There could be situations (though hopefully not frequent) where a machine was created without metal3machine (yet ?). We should probably use continue there too, no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelk The if statement above "if m.Spec.InfrastructureRef.Name == nil", should catch the case where Machine exists but Metal3Machine does not. I suppose there is a rare corner case where InfrastructureRef is updated but Metal3Machine is not yet created. I will make the change.

@kashifest
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2021
@Arvinderpal
Copy link
Contributor Author

@fmuyassarov Can you issue /test-integration? Seems to get ignored when I issue it.

/test-integration

@kashifest
Copy link
Member

/test-integration
/test golint

Copy link
Member

@fmuyassarov fmuyassarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/cc @maelk @kashifest

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2021
@maelk
Copy link
Member

maelk commented Apr 1, 2021

@Arvinderpal I have a nit, however, I think it could be good to go as is, and fixed afterwards, as you wish. I will approve the PR and put a hold. if you want to merge it and fix after, that's ok by me, since it's a detail, then jut remove the hold. But if you fix it in this PR, you'll just need another lgtm.
/approve
/hold

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2021
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Arvinderpal, digambar15, maelk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2021
@maelk
Copy link
Member

maelk commented Apr 1, 2021

Hum, my comment is not easily visible, it is regarding lne 306, where we should probably use a continue too.

Metal3Cluster that defines prefix set.
Only labels that fall within prefix set will be synchronized.
@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2021
@Arvinderpal
Copy link
Contributor Author

/retest

@Arvinderpal
Copy link
Contributor Author

Can someone kick off /test-integration for me?
I have addressed all the open issues, so we should be in good shape to merge.

@kashifest
Copy link
Member

/test-integration

@kashifest
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2021
@Arvinderpal
Copy link
Contributor Author

@maelk @fmuyassarov PTAL. Should be ready to merge.

@fmuyassarov
Copy link
Member

/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2021
@fmuyassarov
Copy link
Member

/test golint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants