Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add flag --cluster-type to disable auto discovery #2923

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

akesser
Copy link

@akesser akesser commented Oct 11, 2023

Description

This pull request adds a flag --cluster-type that allows users to set the used cluster type and disable auto discovery, that may lead to false results. This is a new feature to fix #2920

Without this fix tigera cannot be used in a EKS cluster that uses fqdnnetworkpolicies.networking.gke.io, as the autodiscovery mechanism of tigera believes to be on a GKE cluster in this case. I tested this change locally without and with the flag using different valid and invalid values.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@CLAassistant
Copy link

CLAassistant commented Oct 11, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good escape hatch to have. @tmjd what do you think? We had talked about simply re-ordering the execution of the auto-discovery checks, but this is comparatively more robust (albeit, requiring more work on the part of the user).

Perhaps we want to do this, and also move the EKS check above the GKE check so that you won't have to use this flag in your particular scenario anyway?

@tmjd
Copy link
Member

tmjd commented Oct 16, 2023

I think this is an acceptable solution. I'd have concern with changing the auto discovery ordering that others hit an issue because of reordering. I'd probably have that concern with any changes to the auto discovery code.

@tmjd
Copy link
Member

tmjd commented Oct 17, 2023

I think another option I would consider is to remove the use of the auto discovered provider except for the purpose of adding it to the Installation if there is none set and remove the mismatch error.

It would also require changing many of the controllers to use the provider in the Installation rather than the auto-detected one.

I think something like this would be more in line with the rest of the operator configuration. In general it audo-detects settings or uses a default and saves the auto-detected setting to the config if it was unset, otherwise it uses what is configured.

@caseydavenport
Copy link
Member

caseydavenport commented Oct 26, 2023

@tmjd what do you think - should we go ahead with this approach or take on the slightly larger task of properly allowing the Installation spec field to override the auto-detected value?

I have a slight preference for using the API to configure this myself. I think that will be most intuitive / discoverable for users who hit this issue.

@tmjd
Copy link
Member

tmjd commented Oct 27, 2023

I think we should go with

properly allowing the Installation spec field to override the auto-detected value

That is how the rest of the API works and I don't think there is any particular reason we should differ with the provider.

So the change being suggested is to remove the validation that the auto-detected provider matches the provider in the Installation resource. Also it will be necessary to adjust any place that used the auto-detected provider (which I think it is most if not all controllers) to now use the provider as specified in the Installation resource.

@caseydavenport in main.go we check that the provider is not OpenShift before checking to set UsePSP. I guess checking if PSPs should be used will need to move into all the controllers also. Do you know if there anything we should particularly be concerned about since it uses the clientset which we don't usually use?

@akesser is it clear what we're suggesting and would you be interested in making updates to implement those changes?

@akesser
Copy link
Author

akesser commented Oct 28, 2023

Using the auto detection and allowing the kubernetesProvider of the installation to overwrite the value would force everybody to always set the field in every installation object because you would never know, if a) you need this field at the moment and b) if you will need this in the future.

In our case, we installed first the operator and created the installation, after that we in stalled fqdnnetworkpolicies.networking.gke.io. The operator was running without problems till the first restart of the pod. Why should one repeat to add a field when the person installing the provider could simply add a flag? You could simply merge this pull request and later deprecate and remove the kubernetesProvider field.

@tmjd
Copy link
Member

tmjd commented Oct 30, 2023

Using the auto detection and allowing the kubernetesProvider of the installation to overwrite the value would force everybody to always set the field in every installation object because you would never know, if a) you need this field at the moment and b) if you will need this in the future.

The operator defaults the kubernetesProvider field when it is auto detected, so everybody that has already ran the operator has the kubernetesProvider field set if a provider was auto detected, so no one would need to explicitly set the field.

We would keep the auto detection of the provider for the purpose of defaulting the kubernetesProvider field, but when the Installation is reconciled it would set the kubernetesProvider to the auto detected provider only if it was not already set.
But from then on the auto detected value would be ignored and the kubernetesProvider would be the authority for the provider.

In our case, we installed first the operator and created the installation, after that we in stalled fqdnnetworkpolicies.networking.gke.io. The operator was running without problems till the first restart of the pod.

In this scenario, what Casey and I are suggesting is that the operator would have updated the Installation resource and set the kubernetesProvider to the auto detected value of EKS (this is how it functions currently). Then when the operator was restarted, even though you installed fqdnnetworkpolicies.networking.gke.io and auto detection would have determined GKE, the operator would ignore the auto detected GKE and use the kubernetesProvider already set in the Installation resource which was EKS. So the suggested solution would have ensured you never would have had an issue.

Lets consider if you had the fqdnnetworkpolicies... before deploying the operator case, then the operator would have incorrectly detected GKE, with your current suggestion you would need to update the kuberentesProvider field and modify the deployment of the operator. Then in the future when you update the operator you would need to make the same modification to the operator deployment. With what I'm proposing you would only need to update the kubernetesProvider to EKS, then the operator would see that and use that as the provider and any future updates to the operator deployment would need no modifications.

@danudey danudey modified the milestones: v1.32.0, v1.32.1, v1.32.2 Dec 2, 2023
@danudey danudey modified the milestones: v1.32.2, v1.32.3 Dec 15, 2023
@danudey danudey modified the milestones: v1.32.3, v1.32.4 Jan 18, 2024
@radTuti radTuti modified the milestones: v1.32.4, v1.32.5, v1.32.6 Feb 16, 2024
@radTuti radTuti modified the milestones: v1.32.6, v1.32.7 Mar 25, 2024
@danudey danudey modified the milestones: v1.32.7, v1.32.8 Apr 2, 2024
@radTuti radTuti modified the milestones: v1.32.8, v1.32.9 Apr 26, 2024
@danudey danudey modified the milestones: v1.32.9, v1.32.10 Jun 14, 2024
@danudey danudey modified the milestones: v1.32.10, v1.32.11 Jul 9, 2024
@danudey danudey modified the milestones: v1.32.11, v1.32.12 Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoDiscoverProvider leads to wrong result
7 participants