Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm should support adding restricted labels on Nodes #2509

Closed
Arvinderpal opened this issue Jun 17, 2021 · 8 comments
Closed

kubeadm should support adding restricted labels on Nodes #2509

Arvinderpal opened this issue Jun 17, 2021 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@Arvinderpal
Copy link

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): All

Environment:

  • Kubernetes version (use kubectl version): 1.16+

What happened?

Currently, specifying a restricted label via the kubelet's --node-labels flag will cause node bootstrapping to fail, with the kubelet throwing an error.

What you expected to happen?

I would like an easy and secure means by which to assign roles to my nodes. Previously, kubelet's self labeling mechanism could be used. However, security concerns have restricted the allowed set of labels; In particular, kubelet's ability to self label in the .kubernetes.io/ and .k8s.io/ namespaces is mostly prohibited.

At a minimum, kubeadm should support adding restricted labels with prefixes:

node-role.kubernetes.io/<role>

  • This label has been used widely in the past to identify the role of a node. Additionally, kubectl get node looks for this specific label when displaying the role to the user.

node-restriction.kubernetes.io/

  • Use of this prefix is recommended in the K docs node-restrictions for things such as workload placement.

How to reproduce it (as minimally and precisely as possible)?

For example:

     joinConfiguration:
	        nodeRegistration:
	          kubeletExtraArgs: 
	            node-labels: node-role.kubernetes.io/dmz-worker,node-restriction.kubernetes.io/secure-workload=true

The following error is produced:

kubelet[2083]: --node-labels in the 'kubernetes.io' namespace must begin with an allowed prefix (kubelet.kubernetes.io, node.kubernetes.io) or be in the specifically allowed set(beta.kubernetes.io/arch, beta.kubernetes.io/instance-type, ...

Anything else we need to know?

Workarounds that have been recommended include:

  • Users can use label prefixes outside the restricted set; however, this poses security concerns and should be avoided.

  • Users can use kubectl to manually label the nodes. This is undesirable for multiple reasons:

    • The large delay between Node being ready and the manual application of the label by the user can lead to scheduler placing pods on the node that should not be there.
    • Inconsistencies/human-error can come into play.
    • Declarative approaches (e.g. CAPI) that use kubeadm for node bootstrapping cannot specify restricted labels in their manifests. Workarounds based on kubectl are recommended.

Possible approach for kubeadm:

Kubeadm currently adds the restricted label "node-role.kubernetes.io/master" using the K api client during the MarkControlPlane phase. Similar approach could be taken to add user specified restricted labels to worker as well as master nodes.

Related

eksctl-io/eksctl#2363
#1060

@Arvinderpal
Copy link
Author

Just brainstorming a possible approach:

  • Similar to how kubeadm adds the restricted label node-role.kubernetes.io/master to the master nodes, we can add support for other user specified labels for both the worker as well as master nodes.
  • The master label is not added via the kubelet's node registration options as the label uses a restricted prefix, instead it's added during the phase MarkControlPlane using the K apiclient.
  • We could extend MarkConrolPlane to also add restricted labels. For workers, we could define a new phase - 'MarkWorkers' which would add the resticted labels to the worker node objects.
  • How does the user specify the restricted labels to kubeadm?
    • (1) Parse the labels the user specifies in nodeRegistration.kubeletExtraArgs["node-labels"]
      • Any node-role and node-restriction.kubernetes.io labels that are present are removed from that list and are instead applied later in the MarkControlPlane/MarkWorker phase.
      • This should not be a breaking change and should not require an api changes.
      • This may create some confusion however since "node-labels" flag is a kubelet flag.
    • (2) Define a new field under Init/JoinConfiguration for specifying restricted labels, similar to how Taints are specified:
kind: JoinConfiguration
	nodeRegistration:
	    restrictedLabels: []

Race condition (corner case):

  • There is a potential race between the pod scheduler and label being applied to the node. This is also an issue with how we label and taint control-plane nodes today - Move Control Planes taint to kubelet config instead of markcontrolplane phase #1621.
  • The solution is to taint the control-plane node via the kubelet's --register-with-taints flag during the kubelet phase.
  • Similar approach could be used here: we could taint nodes with some preset taint via the --register-with-taints, then remove the taint during the MarkControlPlane/MarkWorkers phase after the restricted labels have been applied.
  • Potential Issues with DaemonSets (e.g. for CNIs)
    • Taints will also prevent DaemonSets from being scheduled on the node -- some like for CNIs are prerequisites for the Node to achieve ready status. This needs to be investigated.

@neolit123
Copy link
Member

neolit123 commented Jun 17, 2021

one problem here is that joining nodes that do not have the admin.conf credential (e.g. workers) do not have sufficient permissions to to self-label with node-role.kubernetes.io/foo (same for node-restriction...). this is actually a security concern and we don't want workers to self label with these special labels.

you can try that with kubectl label --kubeconfig /etc/kubernetes/kubelet.conf ... node-role.kubernetes.io/foo= from a worker and you'd see a "forbidden" error.

since kubeadm is not a controller it has no way to auto-label new joining workers with an admin credential.

@neolit123 neolit123 added kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jun 17, 2021
@neolit123
Copy link
Member

on the SIG Cluster Lifecycle level, in the past it has been discussed multiple times that higher level tools can do that - cluster-api, kops, minikube, kind.

@Arvinderpal
Copy link
Author

you can try that with kubectl label --kubeconfig /etc/kubernetes/kubelet.conf ... node-role.kubernetes.io/foo= from a worker and you'd see a "forbidden" error.

Ah, of course. 🤦

on the SIG Cluster Lifecycle level, in the past it has been discussed multiple times that higher level tools can do that - cluster-api, kops, minikube, kind.

The problem I see is that these high-level tools generally have the same problem as the kubectl workaround -- a race exists between the scheduler and the tool/controller that applies the labels. This can lead to unwanted workloads on the node(s) of interest -- this is kind of a big deal, IMO.

I suppose some mechanism to prevent these unwanted workloads could be devised. For example, kubeadm does support taints via the NodeRegistrationOptions. The high-level tools could use that to taint new nodes via kubelet's flag and then remove the taint after the labels have been applied. Just need to think through the consequences of that -- for example, the taint would temporarily prevent CNIs from being installed on the nodes but perhaps that is acceptable given that the taint is only applied for short duration.

For reference, here is the relevant CAPI issue: kubernetes-sigs/cluster-api#493

@neolit123
Copy link
Member

neolit123 commented Jun 17, 2021

it can certainly be orchestrated on a high level, it's just that it's blocked on the kubeadm joining node credential level and the fact that kubeadm is not a service. it can be added as a feature of an eventual kubeadm operator:
#2315
#2318

today, we could allow Labels under NodeRegistrationOptions the same way we have Taints, but then we have to document that the kubernetes.io* labels are reserved and not allowed to be self-set by node clients. so this would make the Labels field less useful, as what users mostly want is node-role labels to make the kubectl operator to be able to see "role tab" entries for these nodes. they can also do that with kubeletExtraArgs, but not for node-role, of course.

@neolit123
Copy link
Member

For reference, here is the relevant CAPI issue: kubernetes-sigs/cluster-api#493

while i don't know about all the details of the thread, it seems natural for the Cluster API's machine controller to maintain the node labels without a kubeadm binding (as the CAPI bootstrap provider).

  • it has the credentials to do that
  • it knows what machine specs the users feed
  • it knows what machines are present

@Arvinderpal
Copy link
Author

@neolit123 feel free to close this issue or leave it open if think appropriate.
Might be a good reference for other people who come along requesting the same feature.

@neolit123
Copy link
Member

Added xref in #2315 as a potential kubeadm operator feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

2 participants