Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed Kubernetes in CAPI #7494

Closed
richardcase opened this issue Nov 4, 2022 · 21 comments · Fixed by #8500
Closed

Managed Kubernetes in CAPI #7494

richardcase opened this issue Nov 4, 2022 · 21 comments · Fixed by #8500
Assignees
Labels
area/api Issues or PRs related to the APIs kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@richardcase
Copy link
Member

/kind design
/kind api-change
/area api

User Story

As a cluster service consumer, I want to use Cluster API to provision and manage Kubernetes Clusters that utilize my service provider's Managed Kubernetes Service (i.e. EKS, AKS, GKE), So that I don’t have to worry about the management/provisioning of control plane nodes, and so I can take advantage of any value add services offered by the service provider.

Detailed Description

Let's take another look at how we managed Kubernetes services can be presented in CAPI and its providers.

The original Managed Kubernetes in CAPI proposal, based on guidance from the community at the time, explicitly did not consider (i.e. out of scope) changes to CAPI itself and focused purely on fitting managed Kubernetes into the existing API types. The existing proposal is useful in the short term to provider guidance to provider implementers.

However, it has become apparent that the recommendations made in the proposal, whilst helpful, are less than ideal when it comes to managed Kubernetes. Some areas where it's less than ideal (not an exhaustive list):

  • It's assumed that the infra cluster provisions & reports the control plane endpoint. However, for managed k8s, an api server endpoint is usually created as part of creating the service in the cloud provider.
  • Is there a difference between infra cluster and control plane when it comes to managed k8s? Arguably not, and so do we need both constructs?
  • Some services have the need for multiple kubeconfigs

So, we should re-look at how we represent managed Kubernetes in CAPI and include the option of changes to CAPI itself to represent managed k8s as a first class citizen.

It's expected that this will result in a "Managed Kubernetes in CAPI v2" proposal.

Anything else you would like to add:

We should still consider the recommendations made in the original proposal (i.e. option 3), along with any new options that are a result of looking at changes to capi itself. We can then consider the trade-offs and decide a longer-term strategy for managed Kubernetes in capi.

Short-term, the 3 main cloud providers could achieve consistency quickly by going with option 2 from the proposal. Although not the recommended approach, it was still called out as an option. However, the Oracle provider implemented their managed service according to the document and went with option 3, so we need to consider this as well.

Some related issues that we can also consider as part of this work:

/assign richardcase
/assign jackfrancis
/assign pydctw

@k8s-ci-robot k8s-ci-robot added the kind/design Categorizes issue or PR as related to design. label Nov 4, 2022
@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API area/api Issues or PRs related to the APIs labels Nov 4, 2022
@k8s-ci-robot
Copy link
Contributor

@richardcase: This issue is currently awaiting triage.

If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 4, 2022
@pydctw
Copy link

pydctw commented Nov 4, 2022

cc @joekr @shyamradhakrishnan, CAPOCI team

@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 4, 2022

/triage accepted
great to see this moving forward!

Report a discussion from KubeCon WRT to

Is there a difference between infra cluster and control plane when it comes to managed k8s? Arguably not, and so do we need both constructs?

There could be cases where an infra cluster could be used to provide something on top of "vanilla managed Kubernetes", people were quoting some examples about private clusters or additional security groups if I remember well

People also agreed that what could make sense is to make infra cluster optional, while instead reusing the same CR in two points seems less intuitive

(please, other folks present at the meeting feel free to chime in)

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Nov 4, 2022
@fabriziopandini fabriziopandini removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 4, 2022
@jackfrancis
Copy link
Contributor

@richardcase thank you for this, very succinctly stated and with all the important items. 🙏

I would like to call out how this proposal effects our "consistency across cloud providers" objective. You mention in the description several real-world issues that make consistency a challenge in the current landscape (in practice it would mean all cloud providers committing to either option 2 or 3 as defined in current proposed set of recommendations).

I'd like to get some consensus from folks around the viability of pinning the "consistency across cloud providers" objective to the landing and eventual implementing of this new proposal instead of the already landed proposal. This would suggest a few obvious outcomes:

  • in the near term, cloud providers implementing (or re-implementing) Managed Kubernetes would make a best effort to adhere to either option 2 or 3 here according to provider-specific (and that provider's customers) criteria.
  • consistency-as-a-goal means that this new proposal would accept a higher standard of cross-provider affimation as part of the specification gathering and finalization process: this new spec will in practice be an "API requirement" for implementing provider Managed Kubernetes in Cluster API, so we want to make an extra effort to involve the input of all providers in the ecosystem
  • consistency-as-a-goal would be explicitly delayed for the Cluster API Managed Kubernetes community until this new proposal is accepted, implemented, and released (perhaps, e.g., as part of a v1 API of Cluster API)
  • we probably will also want to include in the scope of this effort some provider recommendations for migrating the current, existing Managed Kubernetes solutions onto the eventual Cluster API-blessed solution; by the time this consistent API is available to the community, we should anticipate that many customers will have already built platforms on top of the previous provider-specific Managed Kubernetes APIs

From the capz community: @mtougeron @zmalik @luthermonson @LochanRn @NovemberZulu (there are others as well), do you have any opinions on a willingness to adopt an eventual, Cluster API-specific Managed Kubernetes standard that will be consistent across cloud providers at a future date according to a new API? Does such a strategy have downsides to current platform adoption?

cc @CecileRobertMichon @nojnhuh

@pydctw
Copy link

pydctw commented Nov 4, 2022

Great write up, @richardcase.

I also want to add what I heard from managed Kubernetes customers in the community. As pointed out in @jackfrancis's CAPZ Managed Kubernetes evolution proposal, the typical customer persona is a Cluster API, multi-cloud, Managed Kubernetes customer first of all and what they care about is consistency across Managed Kubernetes. They said they don't really expect consistency between managed and unmanaged Kubernetes in CAPI.

Considering managed Kubernetes can utilize many value added services offered by cloud providers, i.e. managed addons and easy installation mechanism, built-in health checks and scaling capabilities, I think managed Kubernetes proposal should focus on making it easy to bring in managed services' ever-increasing capabilities instead of trying to fit into current CAPI contracts that were designed for unmanaged Kubernetes.

@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 4, 2022

They said they don't really expect consistency between managed and unmanaged Kubernetes in CAPI.

Ithink managed Kubernetes proposal should focus on making it easy to bring in managed services' ever-increasing capabilities instead of trying to fit into current CAPI contracts that were designed for unmanaged Kubernetes.

Frankly speaking, I think we should be really careful in taking this path
Most of the value of Cluster API is having a common abstraction over different types of infrastructure, enabling higher-level tools to build truly multi-cluster/multi-infrastructure in a very effective way.

So, in my humble opinion, this task should be about how to embrace managed Kubernetes while preserving a common abstraction (that of course can be improved from the current state), not about inventing a new abstraction.

@richardcase
Copy link
Member Author

I agree @fabriziopandini 👍 The desire to keep managed / unmanaged as close as possible in the long term will be carefully considered. I think there are ways to do that whilst still being more implementation friendly to the managed provider implementors.

@jackfrancis
Copy link
Contributor

The challenge here is that we have a trade-off in terms of what to prioritize for "close"-ness.

  1. self-managed clusters and managed clusters on capi share a common API
  • in this scenario we will probably have to accept that that common API will behave slightly differently between managed and unmanaged
  1. preserve existing capi behavior for self-managed clusters
  • in this scenario we are probably nudged towards a dedicated API for managed (probably as simple as a single CRD, but still it would be edicated to managed k8s)

I'm not sure what the right answer is, looking forward to input from the community!

@shyamradhakrishnan
Copy link

@jackfrancis we need to understand the flaws of the current managed cluster CAPI proposal. It did tie nicely to CAPI API such as control plane, machine pool etc and also to Kubernetes constructs. We in OCI did not find any huge problems implementing using the proposal. Ofcourse there is a minor naming problem in that the control plane translates to the Cluster object in cloud provider, but technically the cluster(for example OKE/AKS) etc are really running managed control plane.
If you did not have to migrate and were starting afresh, would you have been OK with the proposal? I would suggest we do think about that since we had spent quote a bit of time on the proposal, so anything new should have proper justification. Also another point to consider is can the current proposal be modified to answer all the concers.

Answering some questions above, do we need infra cluster and control plane, thats how current CAPI works and we can question that as well right? For example in current implementation, the infra cluster creates all the network infrastructure, the kubeadmcontrol plane creates the control plane nodes. We are doing the same thing with our impl, the infra cluster creates the network and any other infra, the control plane creates the managed control plane(which is called cluster by most cloud providers)

We can go through our impl and API in a meeting if required. I apologise if I did not understand the problem statement here or did nit explain properly.

@jackfrancis
Copy link
Contributor

Hi @shyamradhakrishnan thanks so much for adding your thoughts. I think that @alexeldeib would agree (though he can speak for himself :) as the original implementer of capz + AKS) that there were not huge problems (after all capz and it seems capoci have built a robust user community around the current implementations), but there is some friction, and I'd say @richardcase states the key points very succinctly, so I'll simply copy them:

  • It's assumed that the infra cluster provisions & reports the control plane endpoint. However, for managed k8s, an api server endpoint is usually created as part of creating the service in the cloud provider.
  • Is there a difference between infra cluster and control plane when it comes to managed k8s? Arguably not, and so do we need both constructs?
  • Some services have the need for multiple kubeconfigs

If we were starting a brand new managed cluster implementation right now for the near term we would 💯 follow the option #3 specified in kubernetes-sigs/cluster-api-provider-azure#2739. I was one of the reviewers of that proposal and I stand by my lgtm. The concerns stated in this particular issue are scoped to solving this at an even higher fidelity for the longer term. My observation is that the very existence of this issue suggests the community desires a better solution to achieve managed k8s standardization longer term; thus the current proposal does not address these longer term concerns (indeed the proposal was not scoped to include changes to Cluster API itself).

thats how current CAPI works and we can question that as well right

I agree. We should continue to discuss the tradeoff of "forcing the managed provider to adhere to Cluster API" vs "forcing Cluster API to adhere to managed providers" (I'm exagerrating a bit, but that's the simplest way of putting it).

My summary of the key theme from the folks starting this thread (@richardcase, @pydctw, and myself) are:

  1. It is a lot of work to convert the existing capa and capz managed k8s implementations to option Create owners file #3, with basically no user value. There is user value in eventually getting every Cluster API provider to implement a common "managed k8s" interface, but in reality we can only deliver that value at significant cost to the existing user community (these changes will be breaking).
  2. The Create owners file #3 option in CAPZ Managed Kubernetes evolution proposal cluster-api-provider-azure#2739 still falls a bit short of a "north star" standard managed k8s proposal (see the bullet points I pasted from @richardcase above)

Based on those 2 points (of course, let's emphasize that these are the views of the minority at present — consensus has not yet been achieved) it seems likely that the community will indeed eventually want to solve for this problem from first principles, and thus we would anticipate breaking our customers (capa and capz, at least) twice if we took on the work of migrating to option #3 from kubernetes-sigs/cluster-api-provider-azure#2739. (The very explicit language of "you don't have to do this if you have an existing managed k8s implementation" is purposeful as the authors of that proposal anticipated this possibility.

Hope that helps, definitely agree that we will want to discuss this fluidly in at least one meeting to clarify everyone's desires/needs/expectations/concerns.

@shyamradhakrishnan
Copy link

shyamradhakrishnan commented Nov 8, 2022

Sounds good @jackfrancis , a small change is that capa needs to be modified to support cluster class, as current implementation does not support it. but MachinePool also does not support ClusterClass, so...

@puja108
Copy link
Member

puja108 commented Nov 8, 2022

There could be cases where an infra cluster could be used to provide something on top of "vanilla managed Kubernetes", people were quoting some examples about private clusters or additional security groups if I remember well

I mentioned something along those lines at KubeCon. My main point being that configurability options present with the managed Kubernetes solutions out there should also be possible with our CAPI solution for them for two reasons:

  1. If our configurability is too low and limited to some generic cases, people will have good reasons not to use CAPI but go with solutions like the KubeCon NA Keynote from cruise, who built their own abstraction with Crossplane.
  2. If coverage of options is too low, we will run into many blockers for future roadmap stories like adoption of existing managed Kubernetes clusters with CAPI.

@jackfrancis
Copy link
Contributor

@puja108 What part of a hypothetical capi ManagedCluster spec + an implemented <Provider>ManagedCluster spec would limit the types configuation discussed in this thread?

@puja108
Copy link
Member

puja108 commented Dec 1, 2022

Sorry Jack, this issue somehow got lost in my backlog.

I did not mean the comment with regard to any current or planned spec. My worries that I mentioned at KCSNA came more from talking to end users at KubeCon and in our virtual end user meetups, where I was seeing a curious rise in end users looking at Cluster API, maybe even trying it out, but then going for a self-built Crossplane-based solution in combination with Managed Control Planes like EKS and AKS. And the three end user companies I asked for the main reason behind this choice mentioned that they had special requirements for configurability or features that they would have had to implement around CAPI, e.g. private network support.

If this is irrelevant wrt the current proposal, then I might have gone off-topic for this thread.

@richardcase
Copy link
Member Author

the three end user companies I asked for the main reason behind this choice mentioned that they had special requirements for configurability or features that they would have had to implement around CAPI, e.g. private network support

@puja108 - are these requirements captured anywhere? Would be great to see if we can cover these in CAPI.

@puja108
Copy link
Member

puja108 commented Dec 1, 2022

Sadly I did not do any structure requirement gathering. I was mainly poking people at events and getting very short feedback. I think it would be worth it to gather them as a group in a more structured way.
As for the private network support, I think some this has been brought up at the CAPA (and soon CAPZ) level by some of my colleagues, and we internally have very specific needs from certain customers, which I hope we can slowly bring upstream, once the stressful PoC phases are over. I'll talk to the teams to see when and how we can bring things back upstream again.

@jackfrancis
Copy link
Contributor

Thanks for weighing in @puja108.

At first glance it seems like these configuration requirements are provider-specific, but we'd love for you to join our regular feature group discussions around managed k8s in cluster api, your time permitting. Meeting details are in the markdown file in this PR that is en route to capi main branch:

#7546

@jackfrancis
Copy link
Contributor

Just a quick status update that this CAEP is near to reaching definitional consensus:

Following that an implementation effort will commence to carry out the work proposed above.

@jackfrancis
Copy link
Contributor

Follow-up status update: last minute scope enhancement under consideration!

Specifically, @vincepri would like us to consider a new CRD definition for the control plane endpoint to aid any new behavioral flexibility we add to the existing assumptions of an infra provider's cluster and control plane resources. This may also help unblock forward progress on a long-standing request to support multiple control plane endpoints: #5295

See: https://kubernetes.slack.com/archives/C8TSNPY4T/p1686781200032249

@shyamradhakrishnan
Copy link

@jackfrancis this will benefit OKE and other managed provider as well I think. For example OKE support private and public endpoints https://docs.oracle.com/en-us/iaas/api/#/en/containerengine/20180222/datatypes/ClusterEndpoints , I am pretty sure other providers will also have this. Even in non managed cases, a Load balancer can have public and private endpoints. So theoretically, this will have a lot of benefits.

@sbueringer
Copy link
Member

@jackfrancis Q: Is this issue done now that 8500 is merged? I assume we would want to track the implementation, maybe let's create a new umbrella issue for the implementation? (which would include the tasks from the PR description: #8500 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Issues or PRs related to the APIs kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants