Create KEP for Instance Host Labelling #1127

alculquicondor · 2019-07-04T16:57:20Z

Proposing a new well-known label for instance host.
In this PR: Summary, Motivation, Proposal, Drawbacks and Alternatives.
Design Details will be included in a follow up.

This PR is superseding #997

/sig cloud-provider
/sig scheduling

/ref kubernetes/kubernetes#75274

k8s-ci-robot · 2019-07-04T16:57:21Z

Welcome @alculquicondor!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-07-04T16:57:27Z

Hi @alculquicondor. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2019-07-04T16:57:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alculquicondor
To complete the pull request process, please assign jagosan
You can assign the PR to them by writing /assign @jagosan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-cloud-provider/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2019-07-05T17:35:00Z

/cc @ahg-g

andrewsykim · 2019-07-08T16:31:34Z

Related discussion in the mailing list https://groups.google.com/forum/#!topic/kubernetes-sig-cloud-provider/32N59IYXogY

draveness

/ok-to-test

alculquicondor · 2019-07-11T16:56:35Z

cc @thockin and @embano1, who laid down some ideas in the thread

embano1 · 2019-07-12T14:45:00Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+#### Story 1
+
+As an application developer, I want pods to be scheduled in the same instance host so that networking between them is faster.


This is a very valid scenario. 👍

Would be great to highlight in which scenarios instance-host is useful over just kubernetes node topology for this case.

Are we saying that instance-host affinity would have to guarantee faster inter-instance networking than a zone?

While I would call this (depending on faster networking by co-scheduling) an anti-pattern I don't think we should hide information or stop people from making their own decisions.

Under no circumstances can we proscribe the semantic implications of a platform when co-scheduling. It might be faster, it might be the same, or it might be slower. We can not say anything about that, in the abstract. A given implementation may choose to document this or not.

So this use case bothers me a lot. Pods are the unit of co-scheduling by design to prevent this scenario which is degenerative. Typically if you care that much you will make it part of the pod itself otherwise you get into scheduling issues. A better way folks typically describe this is by use of other custom labels and calculations of distance. region/zone/cluster/rack are common labels and folks can apply weights to imply distance from those labels. This is how data-gravity scheduling typically works.

Story 2 was actually the driver for this KEP #75274 . However, even pod spreading is a feature in development, whereas pod (anti)affinity already exists. We can remove this Story if it's something we don't want to promote.

This is definitely a story that I don't think we want to promote given the history behind pods (co-scheduling) and existing affinity rules.

embano1 · 2019-07-12T14:49:38Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+#### Story 2
+
+As an application developer, I want pods to be spread in different instance hosts so that if the host goes down, my service doesn’t have to scale from zero.
+I use `topology.kubernetes.io/instance-host` as topology key in [even pod spreading](/keps/sig-scheduling/20190221-even-pods-spreading.md) rules.


I would say that this can already be solved by using the failure-domain.beta.kubernetes.io/zone label. In public cloud environments, VM placement within a region is usually random (e.g. the user does not have full control on it). So a multi-node environment almost by default runs in multiple zones so that the zone label is sufficient for this use case - and also provides higher redundancy than host isolation as zones in the public cloud are highly isolated from each other.

In private cloud environments, a zone is not strictly defined (almost arbitrary, like any other label that could be applied). Thus, in the vSphere cloud provider we give the operator flexibility to define what a zone is via vSphere tags associated to vSphere objects, e.g. a host, cluster or even data center. This tag is read and used for populating failure-domain.beta.kubernetes.io/zone. I acknowledge that different on-prem cloud provider implementations have different semantics here (which makes this discussion not easier...).

I would like to have feedback from other providers on this

Wearing my Google Cloud hat: GCP does not provide this information and does not guarantee that a VM will not be live-migrated to another physical host at runtime. In fact, we practically guarantee that any VM of sufficiently long life will get migrated eventually.

We are designing a solution for populating the label that supports migration, basically a daemonset that frequently checks if the VM got migrated, and updates the label accordingly. Of course this assumes that the cloud provider have an API that allows us to do that (in Google Cloud for example, perhaps this can be provided through the instance metadata service).

I have to double check but you can do this today with your own labeling, and it's not necessarily applicable across providers.

This is covered in Alternatives section.

Afaics, our advice to date has been that if you need failure resilience then you spread your pods by the abstract failure-domain.beta.kubernetes.io/zone label. Once you have done that, then there is no need to also spread by instance host as proposed here (ditto for other possible concrete failure domains like "rack" or "power distribution unit").

Is there something in this story that is not already covered by our existing advice
re failure-domain.b.k.io/zone?

I understand that some clouds (notably AWS) also have a specific concept called a "zone", and that the cloud zones might nest differently to the k8s "zone". Perhaps the motivation for this KEP comes from a misunderstanding as to what/how the k8s "zone" label should be used? (in which case I think the solution is better documentation, or possibly renaming the label, not adding additional labels)

embano1 · 2019-07-12T14:52:33Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+    There is one case worth mentioning:
+    An empty topology key for `preferredDuringSchedulingIgnoredDuringExecution` in inter-pod anti-affinity is interpreted as
+    "all topologies", so the instance host should be considered as well.
+    If the label value is empty, then the anti-affinity is still ensured by the hostname.


Thanks for pointing this out!

embano1 · 2019-07-12T14:53:01Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+- Risk: Some or all nodes are missing a value for the instance host label and an application developer tries to use the label as a topology key.
+
+  Mitigation: This will be documented as undefined behavior.


This should be no different that any other topo key.

embano1 · 2019-07-12T15:10:52Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+- Risk: If a cloud provider or hypervisor performs a live migration of a VM, the host will change.
+
+  Mitigation: The label should be updated. Evicting running pods is out-of-scope for the scheduler.
+  New pods are scheduled according to the new labels.


This is the biggest concern I have with this KEP. YMMV and the following comments are highly dependent on the hypervisor/virtualization platform (capabilities available/exposed/configured).

@bsalamat brought up an important point:

We have accepted the fact that clusters are dynamic in nature and scheduling decisions may be invalidated in the future. This is an assumption that other new features should make too.

I agree with him (because fundamentally that's the current implementation of the scheduler). On the other side, from a UX/ user experience perspective this is not what is known/desired. Especially in cases where hard pod anti-affinity is needed, e.g. critical stateful services. Now, a workaround would be to use zone (assuming zone membership is static and won't change, which might be or not be the case...) instead of instance label here - and the users I've worked with do that and never reported any issues with it.

My feeling is that a majority of users would use the instance label for resiliency, which ultimately is not guaranteed when VMs migrate, labels need to be updated and workloads have to be evicted/restarted to reconcile. So why not keep using the zones label for resiliency than? I don't want to repeat every argument of the discussion here, so pointing to the former KEP which has more details: #997 (review)

Some more findings from the field (real-world on-prem deployments I was involved with): when we planned to add zones support in the vSphere cloud provider we went with a two phase approach: first, implement the basic functionality. Second, try to solve the "label drift" issue. It turned out that solving the "label drift" issues was impossible, i.e. it always had consequences (impact) to user workloads. This is not what we wanted to go through, so phase two never was implemented. Did that cause a problem? Not for our users. They leverage zones topology (again that could be arbitrary on-prem) to model resiliency constructs. Whenever needed, vSphere will be configured to never move VMs if that could cause "label drift", e.g. move VM from zone 1 to zone 2 because of cluster imbalance or host failure. I understand that this might not be possible (available) with all hypervisors - which complicates the matter unfortunately.

Also, @anfernee built a prototype to influence scheduling decisions honoring host information. It couldn't solve the "label drift" issue and also did not get a lot of traction because our user base was happy with the current (zones) implementation.

I am not saying to keep the status quo. But assuming that we could easily update labels and restart workloads is a weak assumption, especially for stateful production critical workloads in highly dynamic clusters. Also it might be worth pointing out that not all Kubernetes users can simply modify their workloads to comply with the Kubernetes semantics, be it because they don't have the manpower, access to source code, etc.

@tallclair do you have any thoughts on these semantics for live migration?

I would argue that if a VM migrates, the label MUST be updated. Whether that ripples out into a pod reschedule is a totally different question. We don't have a rescheduler today, but hard anti-affinity might be broken by a migration, which would effectively require either a resched or that we loosen the guarantees for affinity/anti-affinity.

(Just highlighting that live-migrating an instance/pod, and then restarting the pod because its label changed, is equivalent to just not live-migrating it in the first place. I advocate taking the simpler option and just not engaging with this arms race..)

@anguslees the problem is that the hypervisor likely doesn't have a way to tell if it can migrate a VM without breaking any restrictions for the pod.
In the re-scheduling topic, note that we face the same problem when adding new taints to a node for which running pods don't have a toleration.

embano1 · 2019-07-12T15:15:25Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+## Alternatives
+
+Do nothing.


Actually, this sounds a bit naive but might just be ok 😀 On the other side, nothing speaks against adding this label if the use cases and impact (e.g. drift, etc.) are clearly articulated. Customers can then decide which label to use for proximity, resiliency, spreading, etc.

andrewsykim · 2019-07-12T15:34:45Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+## Alternatives
+
+Do nothing.
+Cloud providers and on-prem clusters could support this topology by defining their own label and controllers to populate it.


We should survey the existing providers to see if exposing this topology is even possible. I get the sense that it's currently only useful for 1-2 providers, for the others it's just not feasible or the API does not expose that information. If we find the use-case is common enough we should codify it.

Maybe we can put up a vote on this one in SIG Cloud Provider? Just because a current cloud provider can't support it, doesn't mean they won't want it in the future.

I agree. Andrew, what is the best way to survey existing providers? should we do it on this KEP or send something like a form?

Also, can you please tag the stakeholders from sig-cloud-provider?

Something in the mailing list is ideal and then final vote on the next SIG call.

andrewsykim · 2019-07-12T15:37:27Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+## Proposal
+
+Add a well-known label for instance host: `topology.kubernetes.io/instance-host` and make its default value empty.


Don't love the label name topology.kubernetes.io/instance-host TBH, I feel it would get confused with just "instance". The alternative is "topology.kubernetes.io/hypervisor` but that is too specific maybe?

Alternatives I can think of:

physical-machine sounds like jargon and it's long.

vm-host can it be confused with just vm?

Some others I've thought of:

topology.kubernetes.io/vmm (virtual machine monitor)

topology.kubernetes.io/hardware

None of them stick out to me though

What about topology.kubernetes.io/physical-host (mentioned on #997)? it avoids the confusion and is quite descriptive as well.

topology.kubernetes.io/physical-host could be a better name IMO.

physical-host is not egregious. The default should not be empty, but the whole label should be undefined.

timothysc · 2019-07-18T17:18:06Z

/hold
/assign @bgrant0607 @smarterclayton @liggitt @thockin
/cc @kubernetes/sig-architecture-feature-requests

This one is pretty thorny and imo requires arch approval b/c it means treating label as fully fledged api consts.

andrewsykim · 2019-07-18T17:33:34Z

See thread in https://groups.google.com/forum/#!topic/kubernetes-sig-cloud-provider/32N59IYXogY for more context

alculquicondor · 2019-07-19T15:03:45Z

FYI to reviewers, a survey was created

/cc @bsalamat

bsalamat · 2019-07-19T23:49:36Z

This one is pretty thorny and imo requires arch approval b/c it means treating label as fully fledged api consts.

@timothysc by removing semantic aspects of labels we will limit ourselves much from implementing advanced features in Kubernetes. Today, the scheduler uses labels to find about nodes, zones, and regions. What is proposed here is very similar. We could remove all the logic that treats these labels specially, but then features such as default spreading of pods among nodes and zones that is one of the widely used features of K8s is removed as well.
It is fine and appreciated to have an arch review of this PR, but please consider all pros and cons of this feature when making a decision.

timothysc · 2019-07-22T19:54:21Z

@bsalamat - we're lacking policy and formality here to treat them like 1st class APIs which they should be. TL;DR sig-arch really needs to take an official stance on this issue. We've skirted around it for a long time, saying on thing and doing another.

I'm not opposed to this work, but it's abundantly clear that we are saying and doing two different things.

thockin

@bsalamat said:

Today, the scheduler uses labels to find about nodes, zones, and regions

One could argue that it shouldn't, or at least that the set of "hardcoded" topology labels should be a flag or config-file. That way, a provider which offers this new key can put that key into the config. Then it doesn't matter if it is kubernetes.io/something or provider.com/something. Obviously that would matter to users. I am not convinced this is widespread enough to worry about user-provider portability.

@timothysc said:

sig-arch really needs to take an official stance on this issue

I don't disagree, but we have SO MANY things in flight, I don't know that another architectural proclamation is going to help anyone. I care about this topic, but I simply don't have enough hours in the day to do the work.

Summary: I don't hate this but it's a one-off fix to a specific instance of a class of problems. If enough providers support it and we can agree to (and clearly document) sane semantics in the face of migration, I won't block this.

thockin · 2019-07-23T23:08:16Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+## Proposal
+
+Add a well-known label for instance host: `topology.kubernetes.io/instance-host` and make its default value empty.


physical-host is not egregious. The default should not be empty, but the whole label should be undefined.

thockin · 2019-07-23T23:10:31Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+#### Story 1
+
+As an application developer, I want pods to be scheduled in the same instance host so that networking between them is faster.


While I would call this (depending on faster networking by co-scheduling) an anti-pattern I don't think we should hide information or stop people from making their own decisions.

Under no circumstances can we proscribe the semantic implications of a platform when co-scheduling. It might be faster, it might be the same, or it might be slower. We can not say anything about that, in the abstract. A given implementation may choose to document this or not.

thockin · 2019-07-23T23:13:31Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+#### Story 2
+
+As an application developer, I want pods to be spread in different instance hosts so that if the host goes down, my service doesn’t have to scale from zero.
+I use `topology.kubernetes.io/instance-host` as topology key in [even pod spreading](/keps/sig-scheduling/20190221-even-pods-spreading.md) rules.


Wearing my Google Cloud hat: GCP does not provide this information and does not guarantee that a VM will not be live-migrated to another physical host at runtime. In fact, we practically guarantee that any VM of sufficiently long life will get migrated eventually.

thockin · 2019-07-23T23:15:56Z

keps/sig-cloud-provider/20190704-instance-host-labelling.md

+
+- Risk: Some or all nodes are missing a value for the instance host label and an application developer tries to use the label as a topology key.
+
+  Mitigation: This will be documented as undefined behavior.


This should be no different that any other topo key.

thockin · 2019-07-24T15:25:48Z

Why does this want to be a Daemonset rather than a simple controller running $somewhere. E.g. the per-provider cloud-controller-manager ?

…

On Tue, Jul 23, 2019 at 5:14 PM ahg-g ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In keps/sig-cloud-provider/20190704-instance-host-labelling.md <#1127 (comment)> : > +Add a well-known label for instance host: `topology.kubernetes.io/instance-host` <http://topology.kubernetes.io/instance-host> and make its default value empty. + +The label will be named `failure-domain.kubernetes.io/instance-host` <http://failure-domain.kubernetes.io/instance-host> if KEP [Promoting cloud provider labels to GA](#839) doesn’t land. + + +### User Stories + +#### Story 1 + +As an application developer, I want pods to be scheduled in the same instance host so that networking between them is faster. +I use `topology.kubernetes.io/instance-host` <http://topology.kubernetes.io/instance-host> as topology key in [inter-pod affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity-beta-feature) rules. + +#### Story 2 + +As an application developer, I want pods to be spread in different instance hosts so that if the host goes down, my service doesn’t have to scale from zero. +I use `topology.kubernetes.io/instance-host` <http://topology.kubernetes.io/instance-host> as topology key in [even pod spreading](/keps/sig-scheduling/20190221-even-pods-spreading.md) rules. We are designing a solution for populating the label that supports migration, basically a daemonset that frequently checks if the VM got migrated, and updates the label accordingly. Of course this assumes that the cloud provider have an API that allows us to do that (in Google, perhaps we can provide that through the instance metadata service). — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1127?email_source=notifications&email_token=ABKWAVB2PS4UOIFAGEQOYJLQA6NFPA5CNFSM4H55O752YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB7LNUHY#discussion_r306583619>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKWAVHPTJGOVVWDUXUYF6DQA6NFPANCNFSM4H55O75Q> .

ahg-g · 2019-07-24T16:57:47Z

Why does this want to be a Daemonset rather than a simple controller running $somewhere. E.g. the per-provider cloud-controller-manager ?

We may need to use a Daemonset if the API for retrieving the instance's physical host is local to the instance (e.g., GCE instance metadata, although GCE currently does not support that yet), but using a cloud-controller-manager is certainly the simpler option if that is not the case.

andrewsykim · 2019-07-24T17:05:48Z

Reminder for folks to fill out this survey by @alculquicondor so we can gauge how many providers want/can support this.

https://docs.google.com/forms/d/e/1FAIpQLSew1s81jKE_dWt55CeZdbEqkgYJb04lJPshCNQqnFzg4VV3WA/viewform

khenidak · 2019-07-24T23:00:24Z

Thank you for this. Cloud providers AFIAK does not make any guarantee on faster network for VMs on the same physical host. Instead they make guarantees around minimum bandwidth within a zone, datacenter or a similar geo concepts. Hence leaking the physical host info into the cluster won't create any additional value, at least from the public cloud point of view.

I have not seen a cloud provider that provides means to learn about and (leak) physical host name in a VM. Instead constructs like zone and fault domains are used. Those constructs are then fed into the cluster to be used as value for zone (or even custom labels) and are used by node selector and pod affinity features. Collectively they satisfy requirements that are described in Story 1 and Story 2.

For on-prem scenarios, the physical host name can be labeled on the node or fed as values for zone labels and still be used the same way. That at least grantees some level of portability between deployments created on public clouds and the ones created for on prem.

For the reasons above i don't think this KEP should be accepted

thockin · 2019-07-25T00:38:14Z

Since it came up in conversation again today, a recap of my POV:

Nothing stops any cloud provider from decorating nodes with any label they want.

Topology is arbitrary and not necessarily hierarchical. A cloud provider might have any number of topology-related concepts (including 0) that a node exists in.

If a cloud provider knows which "physical host" a node is on, it is free to add that as a label.

If the "physical host" is a common enough thing that we want to grant it a kubernetes.io/ prefix, but I am skeptical.

"Physical host" is not a terrible name, but it is very precise. We may want something more abstract (or not). It could just as easily be "power supply" or "blast zone" or "underlying OS UUID".

No topology key has any particular requirements for semantic meaning. Common things like "zone" and "region" have conventional meanings, and providers who violate those conventions may give their users a bad trip, but they are not invalid.

If we give any label a kubernetes.io/ prefix we must be clear on what is GUARANTEED to happen, what is guaranteed to NOT happen, and what is not guaranteed either way when that label changes (e.g. a live-migration of a VM to a new physical host). We must also define the implications for "hard" scheduling requirements that become invalidated by a migration.

If we want the scheduler to automatically spread across topologies, we should probably not hardcode the special labels. Instead we should take a flag or config which indicates which labels are to be treated specially. The default might be topology.kubernetes.io/zone, but any user could change that (if they control the master) and any hosted master (e.g. GKE, EKS, AKS, ...) could set it to whatever is appropriate for their cloud.

If sig-cloud-provider folks tell me they think this use-case is common enough to justify an elevated prefix, I will go with it, but I will want to see the answers to the above questions.

anguslees · 2019-07-25T02:26:47Z

From a users pov (rather than cloud provider): I just want a way to indicate "spread wide for fault resilience" or "group close for (presumably) better performance/cost". And I want to express that in a cloud-agnostic manner, so it can be burned into (eg) helm charts.
What I don't want is to have to come up with a unique way to express that across each cloud provider, based on the specific provider's choice of technology and terminology.

I object to this KEP on those grounds. I think we already have the former useful knobs (in the abstract region/zone and concrete node labels), and I think this KEP introduces the latter harmful knobs. For the 0.1% of workloads that really do care about specific underlying topology, then they can (and should!) be configured using provider-specific labels, since they are already not naively-portable across providers.

Imo, any mapping between instance-host (or whatever other significant topology) and abstract k8s scheduling concept (ie: region/zone) should be performed inside the cloud provider code. On cloud providers where instance-hosts exist and are exposed to users, then the provider can already make that information available to users in debugging status/logs or clearly-marked provider-specific node labels.

andrewsykim · 2019-07-25T02:58:48Z

Thanks @thockin!

If sig-cloud-provider folks tell me they think this use-case is common enough to justify an elevated prefix, I will go with it, but I will want to see the answers to the above questions.

This is the part we are trying to figure out at the moment. It seems like most cloud providers don't expose this topology information and for those that do it requires some workarounds. The sentiment I'm getting so far is that the use-case is actually not common enough because either the cloud provider doesn't expose the information or the live-migrate semantics are too complex and not worth the trade-off in which case I think we're okay to close this KEP? Will defer to @alculquicondor based on the results of the survey he sent out.

ahg-g · 2019-07-25T13:19:39Z

I just want to state my concern with making it a recommended practice to use the zone label for physical hosts for on-prem. While it may make the spec portable between on-prem and on-cloud, I suspect users will find the cost for this portability high since some cloud providers charge for cross-zone traffic.

kitch · 2019-07-25T13:26:20Z

Just some info from the IBM Cloud side of things. Today we do not expose this topology info for an instances that are running on shared infrastructure. There is some consideration for those who purchase dedicated hosts to more easily gather metadata about which host an instance is on. If you work from the host down then you can discover this information, but it is not readily available from within the OS. In the future if we were to add support for dedicated hosts in IKS we would likely leverage this instance host label feature. In our internal designs to date I was just planning on abusing the zone labeling. :)

andrewsykim · 2019-07-25T19:48:54Z

I suspect users will find the cost for this portability high since some cloud providers charge for cross-zone traffic.

For public clouds I don't think this applies because they don't expose the physical host topology anyways so you would use zones to represent AZs For on-prem, you can still use regions topology for DC or site topology if you use zones for physical host. But also for on-prem, you rarely see clusters spanning multiple DCs (at least from my experience), so topologies typically won't span cross-zone anyways.

alculquicondor · 2019-07-25T21:45:47Z

Given:

The lack of support from most cloud providers to expose the information, and
The possibility for on-prem clusters to make use of zone to represent this topology

We are abandoning this KEP. Thanks for the feedback to all the reviewers.

ahg-g · 2019-07-25T21:47:14Z

Given:

The lack of support from most cloud providers to expose the information, and

The possibility for on-prem clusters to make use of zone to represent this topology

We are abandoning this KEP. Thanks for the feedback to all the reviewers.

Thanks Aldo for driving this issue to a decision, and thanks everyone for the feedback.

Create KEP for Instance Host Labelling

21ec611

k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jul 4, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 4, 2019

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 4, 2019

k8s-ci-robot requested review from andrewsykim and hogepodge July 4, 2019 16:57

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jul 4, 2019

alculquicondor mentioned this pull request Jul 4, 2019

Instance Host Labelling #998

Closed

k8s-ci-robot requested a review from ahg-g July 5, 2019 17:35

draveness reviewed Jul 9, 2019

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 9, 2019

embano1 reviewed Jul 12, 2019

View reviewed changes

andrewsykim reviewed Jul 12, 2019

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2019

k8s-ci-robot assigned bgrant0607 Jul 18, 2019

k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Jul 18, 2019

k8s-ci-robot assigned liggitt, smarterclayton and thockin Jul 18, 2019

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 18, 2019

timothysc self-assigned this Jul 18, 2019

k8s-ci-robot requested a review from bsalamat July 19, 2019 15:03

andrewsykim mentioned this pull request Jul 22, 2019

Create KEP for instance host labelling #997

Closed

thockin reviewed Jul 23, 2019

View reviewed changes

alculquicondor mentioned this pull request Jul 24, 2019

REQUEST: New membership for alculquicondor kubernetes/org#1042

Closed

6 tasks

alculquicondor closed this Jul 25, 2019

davidopp mentioned this pull request Aug 8, 2019

[Feature] Support for anti-affinity/affinity rules for the created machines kubernetes-sigs/cluster-api-provider-vsphere#175

Closed

alculquicondor mentioned this pull request Apr 2, 2020

Add KEP 1659 - standard topology labels #1660

Merged


		#### Story 1

		As an application developer, I want pods to be scheduled in the same instance host so that networking between them is faster.


		- Risk: Some or all nodes are missing a value for the instance host label and an application developer tries to use the label as a topology key.

		Mitigation: This will be documented as undefined behavior.


		## Proposal

		Add a well-known label for instance host: `topology.kubernetes.io/instance-host` and make its default value empty.

Create KEP for Instance Host Labelling #1127

Create KEP for Instance Host Labelling #1127

Conversation

alculquicondor commented Jul 4, 2019 • edited Loading

k8s-ci-robot commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

k8s-ci-robot commented Jul 4, 2019

alculquicondor commented Jul 5, 2019

andrewsykim commented Jul 8, 2019

draveness left a comment

Choose a reason for hiding this comment

alculquicondor commented Jul 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor Jul 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g Jul 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewsykim Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

ahg-g Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc commented Jul 18, 2019

andrewsykim commented Jul 18, 2019

alculquicondor commented Jul 19, 2019

bsalamat commented Jul 19, 2019

timothysc commented Jul 22, 2019 • edited Loading

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Jul 24, 2019 via email

ahg-g commented Jul 24, 2019 • edited Loading

andrewsykim commented Jul 24, 2019

khenidak commented Jul 24, 2019

thockin commented Jul 25, 2019

anguslees commented Jul 25, 2019 • edited Loading

andrewsykim commented Jul 25, 2019

ahg-g commented Jul 25, 2019

kitch commented Jul 25, 2019

andrewsykim commented Jul 25, 2019

alculquicondor commented Jul 25, 2019

ahg-g commented Jul 25, 2019

alculquicondor commented Jul 4, 2019 •

edited

Loading

alculquicondor Jul 24, 2019 •

edited

Loading

ahg-g Jul 24, 2019 •

edited

Loading

andrewsykim Jul 12, 2019 •

edited

Loading

ahg-g Jul 12, 2019 •

edited

Loading

timothysc commented Jul 22, 2019 •

edited

Loading

ahg-g commented Jul 24, 2019 •

edited

Loading

anguslees commented Jul 25, 2019 •

edited

Loading