-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds PolicySetupTimeoutSeconds option to CalicoNetwork #3186
Adds PolicySetupTimeoutSeconds option to CalicoNetwork #3186
Conversation
Going to mark this PR as Ready for Review, although there are still two merge tests that are failing, that I'm a little confused about. Edit: fixed |
b9422d9
to
362c6bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, left a few comments
api/v1/installation_types.go
Outdated
// PolicySetupTimeoutSeconds delays new pods from going ready | ||
// until their policy has been programmed in the dataplane. | ||
// The specified delay defines the maximum amount of time | ||
// that the Calico CNI will wait for policy to be programmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the Calico CNI plugin" or just "Calico"
"CNI plugin" refers to a component, and "Calico" refers to the system as a whole.
@@ -349,6 +349,13 @@ func validateCustomResource(instance *operatorv1.Installation) error { | |||
} | |||
|
|||
} | |||
|
|||
if instance.Spec.CalicoNetwork.PolicySetupTimeoutSeconds != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need validation here to prevent setting this in other invalid scenarios - e.g., when Calico networking isn't being used, or when Calico in eBPF mode is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Windows? Do we prevent this option in a cluster with Windows nodes? Do we just ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been working off the assumption that we would ignore this option on Windows, how does the operator deal with the issue of windows compat? Does it know if a windows node exists in the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I could accept it in the operator, ignore it in felix/cni, if thats an acceptable paradigm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, we would expect that if a config option isn't supported in a certain configuration, that we return an error and prevent installation of the code in that configuration. We should probably do that for this if Windows is enabled in the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im unfamiliar with this part of the operator resources, but my first thought was to check if the windows dataplane config option was non-nil. But, that would presumably only tell me that some nodes may be windows, and I still may want the feature on for non-windows nodes 🤔 any guidance appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no such thing as a "Windows only" cluster - they all have at least one Linux node since the control plane components don't run on Windows.
The usual ideal we strive for is that if configuration is accepted it means it is going to be implemented fully. We don't really have a mechanism to report back "This configuration is sort of working but not entirely"
I think your options are:
- Reject the config outright if Windows is enabled.
- Add support for this feature to Windows.
- Change the config model so that this is very clearly Linux only (e.g.,
LinuxPolicySetupTimeoutSeconds
)
The last one is the least desirable IMO
@@ -355,6 +355,9 @@ func validateCustomResource(instance *operatorv1.Installation) error { | |||
if *instance.Spec.CalicoNetwork.PolicySetupTimeoutSeconds < 0 { | |||
return fmt.Errorf("Installation CNI spec.PolicySetupTimeoutSeconds negative value is not valid") | |||
} | |||
if instance.Spec.CalicoNetwork.LinuxDataplane != nil && *instance.Spec.CalicoNetwork.LinuxDataplane != operatorv1.LinuxDataplaneIptables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want this?
if instance.Spec.CalicoNetwork.LinuxDataplane == nil || *instance.Spec.CalicoNetwork.LinuxDataplane != operatorv1.LinuxDataplaneIptables {
Basically if the data plane is unset, then we don't know if it's OK to enable this feature or not and we need to wait until it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also want a check on WindowsDataplane here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the question I have above, I think this feature would still be desirable in a mixed cluster with windows and linux, if its possible to continue when that is the case. Maybe I could error if only a Windows dataplane is defined, and no Linux one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I could error if only a Windows dataplane is defined, and no Linux one?
Is there a reason we haven't implemented this for Windows?
dedcd3b
to
d60f023
Compare
pkg/render/node.go
Outdated
@@ -628,6 +629,10 @@ func (c *nodeComponent) createCalicoPluginConfig() map[string]interface{} { | |||
|
|||
apiRoot := c.cfg.K8sServiceEp.CNIAPIRoot() | |||
|
|||
LinuxpolicySetupTimeoutSeconds := int32(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a private variable, and Policy
should be capitalized
linuxPolicySetupTimeoutSeconds
d60f023
to
2ffebfa
Compare
2ffebfa
to
b434af5
Compare
Removing "merge-when-ready" label due to new commits |
…tartup-delay functionality in Felix/CNI
b434af5
to
dab7623
Compare
Sibling PR to be merged first
Description
Adds option to CalicoNetwork to enable a policy-setup timeout in the CNI/Felix.
Felix's new option
EndpointStatusPathPrefix
gets set to it's /var/run/calico dir to enable the Felix-side feature, where-in it will create a directory "status".CNI should have its timeout configured according to the CalicoNetwork field, and it's
EndpointStatusDir
should be set to Felix's /var/run/calico dir plus "/status".For PR author
make gen-files
make gen-versions
For PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bug
if this is a bugfix.kind/enhancement
if this is a a new feature.enterprise
if this PR applies to Calico Enterprise only.