Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dissociating Machines from Kubernetes #721

Closed
maisem opened this issue Jan 30, 2019 · 23 comments
Closed

Dissociating Machines from Kubernetes #721

maisem opened this issue Jan 30, 2019 · 23 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@maisem
Copy link

maisem commented Jan 30, 2019

Today, when we talk about machines we implicitly mean that they represent Kubernetes Nodes. As such we have fields in the MachineSpec like the MachineVersionInfo.

I don't think this is the right abstraction i.e. there is no reason for a Machine to equate to a Kubernetes Node. Dissociating this would make the Machines API standalone and more robust especially in light of #490.

The dissociation will also help in the independent adoption of Machines API as it can then be used for managing non-containerized workloads (e.g. legacy applications).

One way of doing this would be to add a reference to the cloud-init script in the MachineSpec. We introduce a new type called KubernetesMachine which would contain the equivalent of MachineVersionInfo and would internally create the MachineDeplyoment resource with the corresponding cloud-init scripts for the desired Kubernetes version.

cc @roberthbailey @dlipovetsky @justinsb

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jan 30, 2019

Managing legacy applications is one motivation, but here is a different one:

Today, a machine actuator provisions both the machine (e.g. by calling the EC2 API) and the software (e.g. by invoking kubeadm init). To date, most providers end up re-implementing software provisioning in the same way, which suggests it could be refactored into a common piece shared across providers.

One way to provide this common piece is by splitting the machine actuator into two actuators, one for machine provisioning, another for software provisioning. A Machine object without Kubernetes-related fields could be a good match for the first actuator, and an object like KubernetesMachine could be a good match for the second actuator.

This works for environments where software provisioning follows machine provisioning. In other environments, machine and software provisioning happen together (e.g. the latter is done as a cloud-init script injected during machine provisioning). That's where the "cloud-init script" field can be used.

@derekwaynecarr
Copy link
Contributor

derekwaynecarr commented Jan 30, 2019

I agree with @maisem , we generally prefer to separate definition of a machine from the kubelet software version. making the machine<->node linkage optional is a preferred outcome for us.

@derekwaynecarr
Copy link
Contributor

Adding @enxebre for his input as well.

@timothysc timothysc added this to the Next milestone Jan 30, 2019
@timothysc timothysc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 30, 2019
@davidewatson
Copy link
Contributor

davidewatson commented Jan 30, 2019

Historically the Cluster API has avoided making any assumptions about the infrastructure and software provisioning except I think we have always assumed we are provisioning Kubernetes clusters and not arbitrary resources. Looking at the Machine[Spec|Status] all of the common fields are related to Kubernetes Nodes (cf. NodeRef, ProviderID, etc.). When I imagine an Infrastructure only MachineSpec I wonder if there will be any common fields.

Instead of creating a common controller for the Machine infrastructure provisioning bits, what if we use webhooks (or provider specific controllers) to abstract out the infrastructure provisioning and keep the Machine actuator for software (and optionally hardware) provisioning of Kubernetes. This way the common Machine resource will still be useful for higher level tools.

@detiber, @dlipovetsky: ^^

@detiber
Copy link
Member

detiber commented Jan 30, 2019

I can see making attempts to separate k8s bootstrapping from machine provisioning as a way to leverage commonalities similar to @dlipovetsky mentions. That said, I definitely worry about extending the scope of the Machine Actuator to support managing machines that are not part of a k8s cluster.

@derekwaynecarr If you remove the ability to link machines to nodes, then don't you also lose the ability to leverage higher level abstractions like the cluster autoscaler?

@alex-mohr
Copy link

Generally a huge +1 to decoupling, but before designing it, I'll note that there are at least 3 components that could be split out: {physical,virtual} machine management, base OS management, shared inside-the-OS software, and app-specific software.

So before we go much further, I'd suggest we come up with some specific user-focused cases that we decide should be supported (or explicitly not supported). What is the problem we're solving for users?

(1) When we say arbitrary machines, what do we mean by that? I'd love a world where one could use k8s Machine APIs to manage e.g. a virtual or physical machine that runs an arbitrary OS (linux, windows, *bsd, plan 9, whatever). Another version of arbitrary is "any arbitrary OS as long as it's a linux that uses cloud-init". Which is it?

(2) If I want to run MS SQL Server on Windows, can I do that?

(3) If my company has a separate machine management team, a separate company-wide Linux OS team, and I'm an App Dev on a third team, do the various Cluster and Machine APIs cleanly decompose into objects (or at least responsibilities) that map to their responsibilities?

@detiber
Copy link
Member

detiber commented Jan 30, 2019

@davidewatson I do like the idea of making the software provisioning/k8s bootstrapping more re-usable, but I also don't want to fully preclude the ability to do the provisioning/bootstrapping through cloud-init or other methods that wouldn't require connecting to the hosts via a remote session (or 100% full decoupling)

@vincepri
Copy link
Member

vincepri commented Jan 30, 2019

This issue sounds a little out of scope for the cluster-api project, at least from my point of view.

Decoupling Machines from a Kubernetes Machine effectively means that cluster-api should now support a variety of use cases which have different requirements, therefore increasing the scope exponentially.

On the other hand, I do like the idea of having a pluggable interface for software provisioning, but I think that should be captured in a different issue.

@detiber
Copy link
Member

detiber commented Jan 30, 2019

Generally a huge +1 to decoupling, but before designing it, I'll note that there are at least 3 components that could be split out: {physical,virtual} machine management, base OS management, shared inside-the-OS software, and app-specific software.

@alex-mohr this sounds good in theory, but how do you separate machine management from base OS management? After provisioning a machine (at least in the majority of cloud environments) you have a base OS.

(3) If my company has a separate machine management team, a separate company-wide Linux OS team, and I'm an App Dev on a third team, do the various Cluster and Machine APIs cleanly decompose into objects (or at least responsibilities) that map to their responsibilities?

I think you can have some items that are split amongst different teams (such as re-using networking, security groups, etc), but I think the idea of extending that separation of concerns down into the actual spinning up and bootstrapping of hosts into a k8s cluster is taking it a bit too far.

@pablochacin
Copy link
Contributor

pablochacin commented Jan 30, 2019

I think separating the machine provisioning from the software provisioning implies there's a standard way for the software provisioning controller to connect to the machine. As @detiber said that might not be necessary or even possible in all cases.

@enxebre
Copy link
Member

enxebre commented Jan 31, 2019

Related to #683
We currently have a clear separation between machine and software/kubelet. The software/kubelet is managed orthogonally hence seeing versions fields in the spec is a bit confusing. We inject a cloud-init/ignition-endpoint via the actuator userdata API field which then serves the config. Then the software upgrades/lifecycle management is oblivious to this API. We still have the ability to match machines to specific nodes so we can leverage it for healtchecking, autoscaling, etc

@dlipovetsky
Copy link
Contributor

While I think the idea @maisem proposes is worth discussing, I agree with @vincepri that it implies significant "feature creep" and thus can be seen as out of scope.

I'm glad we are reflecting on the fact that machine and software provisioning are currently coupled. I recognize that coupling the two (e.g. cloud-init) can be desirable or necessary. But decoupling is sometimes necessary, e.g. in an environment without an infrastructure API, where software must be provisioned after the machine is. Decoupling can also be desirable, if it makes it easy to re-use the software provisioning piece.

I agree with @detiber that we should consider both use cases, and not push one at the expense of the other.

I agree with @dwat that webhooks may in fact work to decouple the two while keeping the existing machine actuator. But the project is still young, I think it's worth considering alternative actuator designs, too.

@dlipovetsky
Copy link
Contributor

I think separating the machine provisioning from the software provisioning implies there's a standard way for the software provisioning controller to connect to the machine.

@pablochacin I think that keeping the two together also implies a standard; the de facto one is cloud-init.

But your comment implies that, if we want to make "software provisioning" something common for providers to easily consume, we'll need to support both the coupled/decoupled use case. For example, we could maintain standard cloud-init scripts, and a standard library for bootstrapping with ssh.

@vincepri
Copy link
Member

vincepri commented Feb 1, 2019

We should discuss at the next meeting and see what we, as a group, want to do regarding this issue. In the meantime, I'd open a new issue to offer software provisioning tools / methodologies which would be responsible for installing Kubernetes binaries, etc.

@detiber
Copy link
Member

detiber commented Feb 1, 2019

Instead of discussing this at the next meeting, maybe we should use the time to more formalize a proposal period for after v1alpha1 where major changes like this can be discussed?

@davidewatson
Copy link
Contributor

davidewatson commented Feb 11, 2019

After hearing from others, I withdraw my undeleted comment. There are valid use cases for creating non-k8s VMs.

@pablochacin
Copy link
Contributor

@pablochacin I think that keeping the two together also implies a standard; the de facto one is cloud-init.

Not really @dlipovetsky. Provisioning together means that we have an standard way to communicate with the provisioner and ask for a new instance to join the cluster, but not necessarily with the machine itself. We don't need, for instance, to manage any kind of credentials for accessing the machines, do we?

Maybe I'm misunderstanding something in the meaning of this paragraph:

Today, a machine actuator provisions both the machine (e.g. by calling the EC2 API) and the software (e.g. by invoking kubeadm init).

That said, the idea of having a cloud-init reference in the spec as suggested in @maisem's initial comment makes a lot of sense. How it is used should depend on the actuator. However, even a cloud-init script may be dependant on the OS of the machine.

@pablochacin
Copy link
Contributor

Maybe this work in the gardener project is relevant for this discussion:
Cloud config (user-data) for bootstrapping machines

Gardener will continue to keep knowledge about the content of the cloud config scripts, but it will hand over it to the respective OS-specific controller which will generate the specific valid representation.

This is the implementation of this concept for CoreOS machines:
https://github.com/gardener/gardener-extensions/tree/master/controllers/os-coreos

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2019
@timothysc timothysc self-assigned this Jun 14, 2019
@timothysc timothysc modified the milestones: Next, v1alpha2 Jun 14, 2019
@timothysc timothysc added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 14, 2019
@ncdc
Copy link
Contributor

ncdc commented Aug 23, 2019

/milestone Next

@k8s-ci-robot k8s-ci-robot modified the milestones: v1alpha2, Next Aug 23, 2019
@vincepri
Copy link
Member

We can close out this issue, we've discussed it at length during Q1/Q2 and stated that this is a non-goal for this project as outlined in https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/scope-and-objectives.md#non-goals.

/close

@k8s-ci-robot
Copy link
Contributor

@vincepri: Closing this issue.

In response to this:

We can close out this issue, we've discussed it at length during Q1/Q2 and stated that this is a non-goal for this project as outlined in https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/scope-and-objectives.md#non-goals.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this issue Jan 31, 2020
…-updates

HAProxy load balancer OVA updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests