Dissociating Machines from Kubernetes #721

maisem · 2019-01-30T17:14:24Z

Today, when we talk about machines we implicitly mean that they represent Kubernetes Nodes. As such we have fields in the MachineSpec like the MachineVersionInfo.

I don't think this is the right abstraction i.e. there is no reason for a Machine to equate to a Kubernetes Node. Dissociating this would make the Machines API standalone and more robust especially in light of #490.

The dissociation will also help in the independent adoption of Machines API as it can then be used for managing non-containerized workloads (e.g. legacy applications).

One way of doing this would be to add a reference to the cloud-init script in the MachineSpec. We introduce a new type called KubernetesMachine which would contain the equivalent of MachineVersionInfo and would internally create the MachineDeplyoment resource with the corresponding cloud-init scripts for the desired Kubernetes version.

cc @roberthbailey @dlipovetsky @justinsb

The text was updated successfully, but these errors were encountered:

dlipovetsky · 2019-01-30T18:05:41Z

Managing legacy applications is one motivation, but here is a different one:

Today, a machine actuator provisions both the machine (e.g. by calling the EC2 API) and the software (e.g. by invoking kubeadm init). To date, most providers end up re-implementing software provisioning in the same way, which suggests it could be refactored into a common piece shared across providers.

One way to provide this common piece is by splitting the machine actuator into two actuators, one for machine provisioning, another for software provisioning. A Machine object without Kubernetes-related fields could be a good match for the first actuator, and an object like KubernetesMachine could be a good match for the second actuator.

This works for environments where software provisioning follows machine provisioning. In other environments, machine and software provisioning happen together (e.g. the latter is done as a cloud-init script injected during machine provisioning). That's where the "cloud-init script" field can be used.

derekwaynecarr · 2019-01-30T18:12:15Z

I agree with @maisem , we generally prefer to separate definition of a machine from the kubelet software version. making the machine<->node linkage optional is a preferred outcome for us.

derekwaynecarr · 2019-01-30T18:15:25Z

Adding @enxebre for his input as well.

davidewatson · 2019-01-30T18:57:36Z

Historically the Cluster API has avoided making any assumptions about the infrastructure and software provisioning except I think we have always assumed we are provisioning Kubernetes clusters and not arbitrary resources. Looking at the Machine[Spec|Status] all of the common fields are related to Kubernetes Nodes (cf. NodeRef, ProviderID, etc.). When I imagine an Infrastructure only MachineSpec I wonder if there will be any common fields.

Instead of creating a common controller for the Machine infrastructure provisioning bits, what if we use webhooks (or provider specific controllers) to abstract out the infrastructure provisioning and keep the Machine actuator for software (and optionally hardware) provisioning of Kubernetes. This way the common Machine resource will still be useful for higher level tools.

@detiber, @dlipovetsky: ^^

detiber · 2019-01-30T18:57:55Z

I can see making attempts to separate k8s bootstrapping from machine provisioning as a way to leverage commonalities similar to @dlipovetsky mentions. That said, I definitely worry about extending the scope of the Machine Actuator to support managing machines that are not part of a k8s cluster.

@derekwaynecarr If you remove the ability to link machines to nodes, then don't you also lose the ability to leverage higher level abstractions like the cluster autoscaler?

alex-mohr · 2019-01-30T18:59:21Z

Generally a huge +1 to decoupling, but before designing it, I'll note that there are at least 3 components that could be split out: {physical,virtual} machine management, base OS management, shared inside-the-OS software, and app-specific software.

So before we go much further, I'd suggest we come up with some specific user-focused cases that we decide should be supported (or explicitly not supported). What is the problem we're solving for users?

(1) When we say arbitrary machines, what do we mean by that? I'd love a world where one could use k8s Machine APIs to manage e.g. a virtual or physical machine that runs an arbitrary OS (linux, windows, *bsd, plan 9, whatever). Another version of arbitrary is "any arbitrary OS as long as it's a linux that uses cloud-init". Which is it?

(2) If I want to run MS SQL Server on Windows, can I do that?

(3) If my company has a separate machine management team, a separate company-wide Linux OS team, and I'm an App Dev on a third team, do the various Cluster and Machine APIs cleanly decompose into objects (or at least responsibilities) that map to their responsibilities?

detiber · 2019-01-30T18:59:47Z

@davidewatson I do like the idea of making the software provisioning/k8s bootstrapping more re-usable, but I also don't want to fully preclude the ability to do the provisioning/bootstrapping through cloud-init or other methods that wouldn't require connecting to the hosts via a remote session (or 100% full decoupling)

vincepri · 2019-01-30T19:02:44Z

This issue sounds a little out of scope for the cluster-api project, at least from my point of view.

Decoupling Machines from a Kubernetes Machine effectively means that cluster-api should now support a variety of use cases which have different requirements, therefore increasing the scope exponentially.

On the other hand, I do like the idea of having a pluggable interface for software provisioning, but I think that should be captured in a different issue.

detiber · 2019-01-30T19:03:54Z

Generally a huge +1 to decoupling, but before designing it, I'll note that there are at least 3 components that could be split out: {physical,virtual} machine management, base OS management, shared inside-the-OS software, and app-specific software.

@alex-mohr this sounds good in theory, but how do you separate machine management from base OS management? After provisioning a machine (at least in the majority of cloud environments) you have a base OS.

(3) If my company has a separate machine management team, a separate company-wide Linux OS team, and I'm an App Dev on a third team, do the various Cluster and Machine APIs cleanly decompose into objects (or at least responsibilities) that map to their responsibilities?

I think you can have some items that are split amongst different teams (such as re-using networking, security groups, etc), but I think the idea of extending that separation of concerns down into the actual spinning up and bootstrapping of hosts into a k8s cluster is taking it a bit too far.

pablochacin · 2019-01-30T19:04:36Z

I think separating the machine provisioning from the software provisioning implies there's a standard way for the software provisioning controller to connect to the machine. As @detiber said that might not be necessary or even possible in all cases.

enxebre · 2019-01-31T08:58:26Z

Related to #683
We currently have a clear separation between machine and software/kubelet. The software/kubelet is managed orthogonally hence seeing versions fields in the spec is a bit confusing. We inject a cloud-init/ignition-endpoint via the actuator userdata API field which then serves the config. Then the software upgrades/lifecycle management is oblivious to this API. We still have the ability to match machines to specific nodes so we can leverage it for healtchecking, autoscaling, etc

dlipovetsky · 2019-02-01T01:15:39Z

While I think the idea @maisem proposes is worth discussing, I agree with @vincepri that it implies significant "feature creep" and thus can be seen as out of scope.

I'm glad we are reflecting on the fact that machine and software provisioning are currently coupled. I recognize that coupling the two (e.g. cloud-init) can be desirable or necessary. But decoupling is sometimes necessary, e.g. in an environment without an infrastructure API, where software must be provisioned after the machine is. Decoupling can also be desirable, if it makes it easy to re-use the software provisioning piece.

I agree with @detiber that we should consider both use cases, and not push one at the expense of the other.

I agree with @dwat that webhooks may in fact work to decouple the two while keeping the existing machine actuator. But the project is still young, I think it's worth considering alternative actuator designs, too.

dlipovetsky · 2019-02-01T01:22:13Z

I think separating the machine provisioning from the software provisioning implies there's a standard way for the software provisioning controller to connect to the machine.

@pablochacin I think that keeping the two together also implies a standard; the de facto one is cloud-init.

But your comment implies that, if we want to make "software provisioning" something common for providers to easily consume, we'll need to support both the coupled/decoupled use case. For example, we could maintain standard cloud-init scripts, and a standard library for bootstrapping with ssh.

vincepri · 2019-02-01T01:33:45Z

We should discuss at the next meeting and see what we, as a group, want to do regarding this issue. In the meantime, I'd open a new issue to offer software provisioning tools / methodologies which would be responsible for installing Kubernetes binaries, etc.

detiber · 2019-02-01T12:56:16Z

Instead of discussing this at the next meeting, maybe we should use the time to more formalize a proposal period for after v1alpha1 where major changes like this can be discussed?

davidewatson · 2019-02-11T21:27:05Z

After hearing from others, I withdraw my undeleted comment. There are valid use cases for creating non-k8s VMs.

pablochacin · 2019-02-12T12:19:30Z

@pablochacin I think that keeping the two together also implies a standard; the de facto one is cloud-init.

Not really @dlipovetsky. Provisioning together means that we have an standard way to communicate with the provisioner and ask for a new instance to join the cluster, but not necessarily with the machine itself. We don't need, for instance, to manage any kind of credentials for accessing the machines, do we?

Maybe I'm misunderstanding something in the meaning of this paragraph:

Today, a machine actuator provisions both the machine (e.g. by calling the EC2 API) and the software (e.g. by invoking kubeadm init).

That said, the idea of having a cloud-init reference in the spec as suggested in @maisem's initial comment makes a lot of sense. How it is used should depend on the actuator. However, even a cloud-init script may be dependant on the OS of the machine.

pablochacin · 2019-02-13T10:56:33Z

Maybe this work in the gardener project is relevant for this discussion:
Cloud config (user-data) for bootstrapping machines

Gardener will continue to keep knowledge about the content of the cloud config scripts, but it will hand over it to the respective OS-specific controller which will generate the specific valid representation.

This is the implementation of this concept for CoreOS machines:
https://github.com/gardener/gardener-extensions/tree/master/controllers/os-coreos

fejta-bot · 2019-05-14T11:33:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-13T12:14:56Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

ncdc · 2019-08-23T17:00:28Z

/milestone Next

vincepri · 2019-08-23T17:02:11Z

We can close out this issue, we've discussed it at length during Q1/Q2 and stated that this is a non-goal for this project as outlined in https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/scope-and-objectives.md#non-goals.

/close

k8s-ci-robot · 2019-08-23T17:02:13Z

@vincepri: Closing this issue.

In response to this:

We can close out this issue, we've discussed it at length during Q1/Q2 and stated that this is a non-goal for this project as outlined in https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/scope-and-objectives.md#non-goals.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…-updates HAProxy load balancer OVA updates

timothysc added this to the Next milestone Jan 30, 2019

timothysc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 30, 2019

russellb mentioned this issue Feb 21, 2019

add nodes-machines-and-hosts design document metal3-io/metal3-docs#8

Merged

This was referenced Mar 13, 2019

Allow user userdata kubernetes-sigs/cluster-api-provider-azure#135

Closed

Allow user to use arbitraty userdata kubernetes-sigs/cluster-api-provider-aws#651

Closed

russellb mentioned this issue Mar 15, 2019

Cluster-API Provisioning Mechanism Consolidation Proposal #585

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2019

timothysc self-assigned this Jun 14, 2019

timothysc modified the milestones: Next, v1alpha2 Jun 14, 2019

timothysc added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 14, 2019

k8s-ci-robot modified the milestones: v1alpha2, Next Aug 23, 2019

k8s-ci-robot closed this as completed Aug 23, 2019

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this issue Jan 31, 2020

Merge pull request kubernetes-sigs#721 from akutz/feature/haproxy-ova…

2251eb1

…-updates HAProxy load balancer OVA updates

enxebre mentioned this issue Feb 5, 2020

REQUEST: New membership for enxebre kubernetes/org#1614

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dissociating Machines from Kubernetes #721

Dissociating Machines from Kubernetes #721

maisem commented Jan 30, 2019

dlipovetsky commented Jan 30, 2019 •

edited

Loading

derekwaynecarr commented Jan 30, 2019 •

edited

Loading

derekwaynecarr commented Jan 30, 2019

davidewatson commented Jan 30, 2019 •

edited

Loading

detiber commented Jan 30, 2019

alex-mohr commented Jan 30, 2019

detiber commented Jan 30, 2019

vincepri commented Jan 30, 2019 •

edited

Loading

detiber commented Jan 30, 2019

pablochacin commented Jan 30, 2019 •

edited

Loading

enxebre commented Jan 31, 2019

dlipovetsky commented Feb 1, 2019

dlipovetsky commented Feb 1, 2019

vincepri commented Feb 1, 2019 •

edited

Loading

detiber commented Feb 1, 2019

davidewatson commented Feb 11, 2019 •

edited

Loading

pablochacin commented Feb 12, 2019

pablochacin commented Feb 13, 2019

fejta-bot commented May 14, 2019

fejta-bot commented Jun 13, 2019

ncdc commented Aug 23, 2019

vincepri commented Aug 23, 2019

k8s-ci-robot commented Aug 23, 2019

Dissociating Machines from Kubernetes #721

Dissociating Machines from Kubernetes #721

Comments

maisem commented Jan 30, 2019

dlipovetsky commented Jan 30, 2019 • edited Loading

derekwaynecarr commented Jan 30, 2019 • edited Loading

derekwaynecarr commented Jan 30, 2019

davidewatson commented Jan 30, 2019 • edited Loading

detiber commented Jan 30, 2019

alex-mohr commented Jan 30, 2019

detiber commented Jan 30, 2019

vincepri commented Jan 30, 2019 • edited Loading

detiber commented Jan 30, 2019

pablochacin commented Jan 30, 2019 • edited Loading

enxebre commented Jan 31, 2019

dlipovetsky commented Feb 1, 2019

dlipovetsky commented Feb 1, 2019

vincepri commented Feb 1, 2019 • edited Loading

detiber commented Feb 1, 2019

davidewatson commented Feb 11, 2019 • edited Loading

pablochacin commented Feb 12, 2019

pablochacin commented Feb 13, 2019

fejta-bot commented May 14, 2019

fejta-bot commented Jun 13, 2019

ncdc commented Aug 23, 2019

vincepri commented Aug 23, 2019

k8s-ci-robot commented Aug 23, 2019

dlipovetsky commented Jan 30, 2019 •

edited

Loading

derekwaynecarr commented Jan 30, 2019 •

edited

Loading

davidewatson commented Jan 30, 2019 •

edited

Loading

vincepri commented Jan 30, 2019 •

edited

Loading

pablochacin commented Jan 30, 2019 •

edited

Loading

vincepri commented Feb 1, 2019 •

edited

Loading

davidewatson commented Feb 11, 2019 •

edited

Loading