Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to represent absence of Taints when marshalling kubeadm configuration #1358

Closed
dlipovetsky opened this issue Jan 23, 2019 · 31 comments · Fixed by kubernetes/kubernetes#77345
Assignees
Labels
area/UX kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@dlipovetsky
Copy link

dlipovetsky commented Jan 23, 2019

Versions

kubeadm API v1alpha3 and v1beta1 (releases 1.11, 1.12, 1.13)

What happened?

I wanted to use a Go program to generate kubeadm configuration, such that no taints are applied to the node. I imported the kubeadm types, initialized an InitConfiguration struct, and set NodeRegistration.Taints to an empty slice. I then marshalled the struct YAML and wrote this to a configuration file. I observed that NodeRegistration.Taints was not present in the file. When I passed the configuration file to kubeadm init, I observed that kubeadm applied the default set of taints to the node.

What you expected to happen?

The Taints field is explained in the code as follows:

Taints specifies the taints the Node API object should be registered with. If this field is unset, i.e. nil, in the kubeadm init process it will be defaulted to []v1.Taint{'node-role.kubernetes.io/master=""'}. If you don't want to taint your master node, set this field to an empty slice, i.e. taints: {} in the YAML file. This field is solely used for Node registration.

I want to use the kubeadm types in a Go program and set the Taints field to an empty slice. However, when I marshal the struct, the field is omitted (because it is tagged omitempty).

How to reproduce it (as minimally and precisely as possible)?

Here is a Go program that generates InitConfiguration with NodeRegistration.Taints set to an empty slice, then marshals the configuration to YAML, then unmarshals it into a fresh InitConfiguration struct.. Note that, in the fresh configuration struct, NodeRegistration.Taints is nil, not an empty slice.

package main

import (
	"fmt"
	"log"

	"k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	kubeadm "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1"
	"sigs.k8s.io/yaml"
)

func main() {
	// Create an InitConfiguration that sets taints to an empty slice
	srcConfig := kubeadm.InitConfiguration{
		TypeMeta: metav1.TypeMeta{
			Kind:       "InitConfiguration",
			APIVersion: "kubeadm.k8s.io/v1beta1",
		},
		NodeRegistration: kubeadm.NodeRegistrationOptions{
			// empty slice means no taints should be applied
			Taints: []v1.Taint{},
		},
	}
	b, err := yaml.Marshal(srcConfig)
	if err != nil {
		log.Fatal(err)
	}

	dstConfig := kubeadm.InitConfiguration{}
	err = yaml.Unmarshal(b, dstConfig)
	if err != nil {
		log.Fatal(err)
	}
	// Taints is not an empty slice
	fmt.Println(dstConfig.NodeRegistration.Taints == nil)
}

Output:

true
@neolit123
Copy link
Member

neolit123 commented Jan 23, 2019

I want to use the kubeadm types in a Go program and set the Taints field to nil. However, because it is not a pointer, I can not

that is true. applying a taint if the list is empty by default is not ideal.

but also our comment in the API is incorrect as the taint is not applied to JoinConfiguration, yet JoinConfiguration also embeds NodeRegistrationOptions.

is this happening in the JoinConfiguration (bug?) or InitConfiguration?

in terms of actions that the logic performs - for kubeadm init it's already possible to skip the phase where the node is tainted as master.

kubeadm init .... --skip-phases=mark-control-plane

but this also skips the label:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/markcontrolplane/markcontrolplane.go

@neolit123 neolit123 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Jan 23, 2019
@dlipovetsky
Copy link
Author

dlipovetsky commented Jan 23, 2019

@neolit123 This is happening in the InitConfiguration.

To be clear, I am using a Go program to generate the InitConfiguration, and am unable to set NodeRegistration.Taints to nil an empty slice. But I have to set it to nil to an empty slice in order for kubeadm to apply no taints to the master.

@neolit123
Copy link
Member

neolit123 commented Jan 23, 2019

understood,

so even if you set the list of taints to { } kubeadm will still taint the node as master.
this is by design, which is not ideal.

but, when executing kubeadm init you can skip the taint using:

kubeadm init .... --skip-phases=mark-control-plane

()note that this will skip the label too)

wouldn't that solve the problem?

@dlipovetsky
Copy link
Author

dlipovetsky commented Jan 23, 2019

so even if you set the list of taints to { } kubeadm will still taint the node as master.
this is by design, which is not ideal.

Actually, kubeadm will not taint the master if the marshalled configuration sets taints to nil an empty slice. (The comment explains this correctly)

nodeRegistration:
  taints: {}

It's possible to create such a configuration by hand, just not using a Go program that imports kubeadm types. If NodeRegistration.Taints is set to nil an empty slice (as in the example program I included), the field is omitted during marshalling.

@neolit123
Copy link
Member

neolit123 commented Jan 23, 2019

b, err := yaml.Marshal(cfg)

while this library is part of the pipeline currently, consumers of the kubeadm config should be using the serializers from api-machinery or the new component-config repo that is WIP.

It's possible to create such a configuration by hand, just not using a Go program that imports kubeadm types. If NodeRegistration.Taints is set to nil (as in the example program I included), the field is omitted during marshalling.

i need to verify this.
sounds like a marshaling side effect of "sigs.k8s.io/yaml"

the marshaling problem aside.

how about this as a workaround?

kubeadm init .... --skip-phases=mark-control-plane

@dlipovetsky
Copy link
Author

Ugh, I'm sorry, I mixed up nil and empty slice. kubeadm will apply default taints if the taints slice is nil, and will not apply taints if the taints field is an empty slice. If I set taints to an empty slice in a Go program, it the field will be omitted, because it's tagged "omitempty." I'll make corrections to the comments.

@dlipovetsky
Copy link
Author

I updated the example program to illustrate what exactly what happens. I create the struct, set Taints to an empty slice, marshal it, then unmarshal to a fresh config, and now Taints is nil, not an empty slice.

@neolit123
Copy link
Member

I updated the example program to illustrate what exactly what happens. I create the struct, set Taints to an empty slice, marshal it, then unmarshal to a fresh config, and now Taints is nil, not an empty slice.

thank you for the update.

If I set taints to an empty slice in a Go program, it the field will be omitted

since our config is in beta, we won't be able to modify this field for a while.
cc @fabriziopandini @rosti

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. area/UX priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jan 23, 2019
@neolit123
Copy link
Member

to re-iterate, while this seems like a problem in the config, it appears to be only a problem when marshaling and not if the users are writing a config by hand.

@dlipovetsky
Copy link
Author

@neolit123 That's correct, it's a problem only when marshalling.

@dlipovetsky
Copy link
Author

My Cluster API provider marshals kubeadm configuration. I think doing this is quite natural when kubeadm is a part of some larger automation that is written in Go. Now that kubeadm is GA, I expect more tools to marshal kubeadm configuration.

@dlipovetsky
Copy link
Author

Looks like this issue has come up before. Here's what the API conventions say:

In most cases, optional fields should also have the omitempty struct tag (the omitempty option specifies that the field should be omitted from the json encoding if the field has an empty value). However, If you want to have different logic for an optional field which is not provided vs. provided with empty values, do not use omitempty (e.g. kubernetes/kubernetes#34641).

@neolit123
Copy link
Member

neolit123 commented Jan 24, 2019

thanks for the link ^.
we should definitely look into this. one problem as already mentioned is that the API is beta and we cannot change it quickly?

for now a workaround is to patch the marshaled yaml string, which isn't a great solution.

@dlipovetsky
Copy link
Author

for now a workaround is to patch the marshaled yaml string, which isn't a great solution.
Agreed.

Specifically for users who want kubeadm init to not apply the node-role.kubernetes.io/master:NoSchedule taint--but continue to apply the node-role.kubernetes.io/master label--to masters, I think using the node-role.kubernetes.io/master:PreferNoSchedule is a workaround. With this taint, Pods will get scheduled on masters, unless other nodes have capacity. See docs for details.

@timothysc timothysc added this to the v1.14 milestone Jan 28, 2019
@timothysc
Copy link
Member

/assign @timothysc

@rosti
Copy link

rosti commented Jan 31, 2019

I don't think, that this can be fixed in the context of v1beta1. Consider this config file:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration

As of now (with omitempty on taints) this will be defaulted to contain the master taint here, because cfg.Taints is nil.
If we remove omitempty, then at the same spot, cfg.Taints will always be a slice (although in our case an empty one).

Therefore, to continue to taint the master, one must add the taint manually in the config like so:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master

This is a breaking change and unacceptable in v1beta1.

On the other hand it is worthy of consideration and probably the best approach for a future config version.

@rosti
Copy link

rosti commented Jan 31, 2019

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 31, 2019
@dlipovetsky
Copy link
Author

dlipovetsky commented Feb 1, 2019

If we remove omitempty, then at the same spot, cfg.Taints will always be a slice (although in our case an empty one).

I don't understand what you are trying to say. Can you please clarify?

Removing the omitempty tag does not affect unmarshalling. It does affect marshalling, but the below demonstrates that, when omitempty is removed, kubeadm will continue to set the default taint if the user provides no taints at all. This is because, without omitempty, Go does not omit a nil slice, but instead marshals to null. And a null is unmarshalled to a nil slice. Thus, kubeadm will continue to apply the default taint to masters.

package main

import (
	"fmt"
	"log"

	"k8s.io/api/core/v1"
	"sigs.k8s.io/yaml"
)

type OldNodeRegistrationOptions struct {
	Taints []v1.Taint `json:"taints,omitempty"`
}

type NewNodeRegistrationOptions struct {
	Taints []v1.Taint `json:"taints"`
}

func main() {
	old := OldNodeRegistrationOptions{}
	b, err := yaml.Marshal(old)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf(string(b))

	new := NewNodeRegistrationOptions{}
	b, err = yaml.Marshal(new)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf(string(b))
}

Output:

{}
taints: null

@timothysc timothysc modified the milestones: v1.14, v1.15 Feb 14, 2019
@dlipovetsky
Copy link
Author

As recommended by the API conventions, the issue can be resolved by removing the "omitempty" removed from the Taints field.

As far as I understand, removing the "omitempty" tag changes only how the field is marshalled. It does not change how the field is unmarshalled. It does not change the default value kubeadm sets. Finally, it does not change the field's type.

Given the above, is there any reason why removing the "omitempty" tag would break v1beta1 API compatibility? Thanks in advance for feedback.

@rosti
Copy link

rosti commented Feb 26, 2019

@dlipovetsky I am really sorry for the late response.

Say kubeadm or third party tool marshales to v1beta1 config with 1.13.3 vendored in. If it doesn't care about taints (which is the usual case) it will simply not initialize that field and when the config is marshalled to YAML it won't have taints key.
Next time when kubeadm uses that config to initialize a cluster, it will unmarshal the config and detect, that Taints is nil. If this is happening on init or join --control-plane it will cause kubeadm to substitute the nil of Taints with an array with only one element - the master taint (see here).

Now, consider, that after some time the same piece of software (be it kubeadm or third party) vendors in newer k8s version, where v1beta1 does not have omitempty on the Taints key. In that case the same piece of software will generate a line of taints: {} when marshalling. When kubeadm unmarshals the config, it will detect the empty array and, thus, not setup the master taint. This is much like what we tell users to do in case they don't want a master taint.

So the same piece of marshaller software, using the same config version, but with a different k8s version causes different output and different behavior when kubeadm uses that config.
That is not acceptable. What we can do is introduce v1beta2, that won't have omitempty on Taints. In that case the old software can still vendor in newer k8s versions and use v1beta1 without breaking stuff.

@dlipovetsky
Copy link
Author

@rosti

Now, consider, that after some time the same piece of software (be it kubeadm or third party) vendors in newer k8s version, where v1beta1 does not have omitempty on the Taints key. In that case the same piece of software will generate a line of taints: {} when marshalling.

Question for you: if this piece of software marshals taints: null (not taints: {}), does the problem you describe exist?

@rosti
Copy link

rosti commented Feb 27, 2019

@dlipovetsky

Now, consider, that after some time the same piece of software (be it kubeadm or third party) vendors in newer k8s version, where v1beta1 does not have omitempty on the Taints key. In that case the same piece of software will generate a line of taints: {} when marshalling.

Question for you: if this piece of software marshals taints: null (not taints: {}), does the problem you describe exist?

The answer is no and I know, that this will be the normal case.

Actually I did not express myself clearly in the quoted lines above. What I meant is that people may depend on that omitempty misplacement bug and not know or care about it. Consider this:

initCfg := v1beta1.InitConfiguration{
    // ... more stuff here
    NodeRegistration: v1beta1.NodeRegistrationOptions{
         Taints: []v1.Taint{}, // make this an empty slice, handy if we can conditionally append something here
    },
}

// fiddle with initCfg (set some fields, possibly append to taints too)

// We can even restore control plane taint if we have appended anything
if len(initCfg.NodeRegistration.Taints) > 0 {
    initCfg.NodeRegistration.Taints = append(initCfg.NodeRegistration.Taints, ControlPlaneTaint)
}

yaml.Marshal(initCfg)

Now let's assume, that we didn't add any taints.
As you know from your experience, with the current version of the config this won't have a taints key in the resulting YAML and if it's consumed by kubeadm, it will apply the master taint by default.
If we remove the omitempty, then this piece of code will suddenly start to produce taints: {} in the resulting YAML, causing kubeadm to stop applying the master taint.

Yes, I admit, this is a corner case. It's not written to the full spec. But I bet, that there is someone who has written their code in that manner and they don't expect the config behavior to change for them, unless they flip the config version. That will be a nasty surprise for them and they'll notice it when random pods start to get scheduled on their control plane nodes.

We just have to be careful with public APIs. Fixing a bug somewhere for someone does not mean, that this won't break someone else's code. That's why we have to do the fixes in new versions.

@dlipovetsky
Copy link
Author

@rosti Thanks for giving more detail. I think we agree that, if omitempty is removed, then by default Taints will be marshalled as taints: null.

Yes, I admit, this is a corner case. It's not written to the full spec.

The code snippet you share assigns an empty slice to the Taints field, which in v1beta means no taints should be applied.

If I understand correctly, you are saying that some tool both

  1. Assigns an empty slice to the Taints field
  2. Wants kubeadm to apply the master taint

I do not think this is a corner case; I think this tool has a bug. To fix the bug, the tool should leave the Taints field as nil.

The omitempty tag in v1beta1 hides the tool's bug. And removing the omitempty tag in v1beta1 will expose the tool's bug. Is that what worries you?

@rosti
Copy link

rosti commented Feb 28, 2019

@dlipovetsky you are correct. It's a bug on our side and, possibly, relies on a bug on the other side for this to happen. But sometimes we need to be bug compatible and support our own bugs (especially if we are shipping a public API).
If it was an alpha config, then I would have been more keen on fixing it in place, but with beta one I am reluctant on doing that.

@fabriziopandini @timothysc WDYT?

@neolit123
Copy link
Member

neolit123 commented Feb 28, 2019

we can make the decision to change or not change next cycle (change/break)... ⚖️
i'm +1 with an action item in the change log, with the change batched with other v1beta1 config changes.

for now, is it worth documenting node-role.kubernetes.io/master:PreferNoSchedule is the workaround in here?
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/

@dlipovetsky
Copy link
Author

But sometimes we need to be bug compatible and support our own bugs (especially if we are shipping a public API)

@rosti, I think we agree that a tool that uses the v1beta1 API correctly will not be affected by the bug fix. If such a tool were affected, I would agree that we should bump the version before fixing the bug.

Should we hold off fixing an API bug, even if the fix only affects tools that use the API incorrectly? I'm not sure. I think this is a subtle and interesting question.

@dlipovetsky
Copy link
Author

@neolit123 Good idea! I'll file a PR on the website repo.

@neolit123
Copy link
Member

please ping me on that.

@rosti
Copy link

rosti commented Mar 1, 2019

It's probably best to bring this topic on next week's kubeadm office hours and gather some opinions.

@neolit123
Copy link
Member

i think it was discussed already during a cluster api meeting and possibly a kubeadm office hours.
the cluster api meeting i remember clearly.

@dlipovetsky
Copy link
Author

I haven't discussed this in kubeadm office hours; I'll try to attend next week.

@rosti, I appreciate all of your feedback! Believe it or not, I recently voiced a similar concern on another PR:

Changing parseTaint can also break compatibility for external consumers, since parseTaint is used in kubectl taint and kubelet --register-with-taints, and external tools may rely on the current (more conservative) parsing to reject the format key:effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants