-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PetSet (was nominal services) #260
Comments
Note that we should probably also rename service to lbservice or somesuch to distinguish them from other types of services. |
As part of this, I'd remove service objects from the core apiserver and facilitate the use of other load balancers, such as HAProxy and nginx. |
It would be nice if the logical definition of a service (the query and/or global name) was able to be used/specialized in multiple ways - as a simple load balancer installed via the infrastructure, as a more feature complete load balancer like nginx or haproxy also offered by the infrastructure, as a queryable endpoint an integrator could poll/wait on (GET /services/foo -> { endpoints: [{host, port}, ...] }), or as information available to hosts to expose local load balancers. Obviously these could be multiple different use cases and as such split into their own resources, but having some flexibility to specify intent (unify under a lb) distinct from mechanism makes it easier to satisfy a wide range of reqts. |
@smarterclayton I agree with separating policy and mechanism. Primitives we need:
This would be enough to compose with other naming/discovery mechanisms and/or load balancers. We could then build a higher-level layer on top of the core that bundles common patterns with a simple API. |
The two primitives described by @bgrant0607 is it worth keeping this issue open? Or are there more specific issues we can file? |
I don't think zookeeper is solved - since you need the unique identifier in each container. I think you could do this with 3 separate replication controllers (one per instance) or a mode on the replication controller. |
Service design I think deserves some discussion as Brian notes. Currently it couples an infrastructure abstraction (local proxy) with a mechanism for exposure (environment variables in all containers) with a label query. There is an equally valid use case for an edge proxy that takes L7 hosts/paths and balances them to a label query, as well as supporting protocols like http(s) and web sockets. In addition, services have a hard scale limit today of 60k backends, shared across the entire cluster (the amount of IPs allocated). It should be possible to run a local proxy on a minion that proxies only the services the containers on that host need, and also to avoid containers having to know about the external port. We can move this discussion to #494 if necessary. |
Tackling the problem of singleton services and non-auto-scaled services with fixed replication, such as master-slave replicated databases, key-value stores with fixed-size peer groups (e.g., etcd, zookeeper), etc. The fixed-replication cases require predictable array-like behavior. Peers need to be able to discover and individually address each other. These services generally have their own client libraries and/or protocols, so we don't need to solve the problem of determining which instance a client should connect to, other than to make the instances individually addressable. Proposal: We should create a new flavor of service, called Cardinal services, which map N IP addresses instead of just one. Cardinal services would perform a stable assignment of these IP addresses to N instances targeted by their label selector (i.e., a specified N, not just however many targets happen to exist). Once we have DNS ( #1261, #146 ), it would assign predictable DNS names based on a provided prefix, with suffixes 0 to N-1. The assignments could be recorded in annotations or labels of the targeted pods. This would preserve the decoupling of role assignment from the identities of pods and replication controllers, while providing stable names and IP addresses, which could be used in standard application configuration mechanisms. Some of the discussion around different types of load balancing happened in the services v2 design: #1107. I'll file a separate issue for master election. |
The assignments would have to carry through into the pods via some environment parameterization mechanism (almost certainly). For the etcd example, I would create:
If pod 2 dies, replication controller 2 creates a new copy of it and reattaches it to volume B. Cardinal service 'etcd' knows that that pod is new, but how does it know that it should be cardinality 2 (which comes from data stored on volume B)? |
Rather than 3 replication controllers, why not a sharding controller, which It just seems wrong to place that burden on users if this is a relatively Do you think it matters if the nominal IP for a given pod changes due to at time 0, pods (A, B, C) make up a cardinal service, with IP's at time 1, the node which hosts pod B dies at time 2, the replication controller driving B creates a new pod D at time 3, the cardinal service changes to (A, C, D) with IP's 10.0.0.{1-3} NB: pod C's "stable IP" changed from 10.0.0.3 to 10.0.0.2 when the set To circumvent this, we would need to have the ordinal values specified On Thu, Oct 2, 2014 at 10:17 AM, Clayton Coleman notifications@github.com
|
I think a sharding controller makes sense and is probably more useful in context of a cardinal service. I do think that IP changes based on membership are scary and I can think of a bunch of degenerate edge cases. However, if the cardinality is stored with the pods, the decision is less difficult. |
First of all, I didn't intend this to be about sharding -- that's #1064. Let's move sharding discussions to there. We've seen many cases of trying to use an analogous mechanism for sharding, and we concluded that it's not the best way to implement sharding. |
Second, my intention is that it shouldn't be necessary to run N replication controllers. It should be possible to use only one, though the number required depends on deployment details (canaries, multiple release tracks, rolling updates, etc.). |
Four, I agree we probably need to reflect the identity into the pod. As per #386, ideally a standard mechanism would be used to make the IP and DNS name assignments visible to the pod. How would IP and host aliases normally be surfaced in Linux? |
Fifth, I suggested that we ensure assignment stability by recording assignments in the pods via labels or annotations. |
that's json. It's an alpha feature added to a GA object (init containers in pods). |
@paralin here is what I am working on. No time to document and get it into k8s repo now, but that is long term plan. https://github.com/k8s-for-greeks/gpmr/tree/master/pet-race-devops/k8s/cassandra Is working for me locally, on HEAD. Latest C* image in the demo works well. We do have issue open for more documentation. Wink wink, knudge @bprashanth |
PetSets example with etcd cluster [1]. |
Be sure to capture design asks on the proposal doc after you finish review On Jun 30, 2016, at 1:25 AM, Jan Chaloupka notifications@github.com wrote: PetSets example with etcd cluster [1]. [1] kubernetes-retired/contrib#1295 — |
the petset docs are https://github.com/kubernetes/kubernetes.github.io/blob/release-1.3/docs/user-guide/petset.md and https://github.com/kubernetes/kubernetes.github.io/tree/release-1.3/docs/user-guide/petset/bootstrapping, I plan to close this issue and open a new one that addresses moving petset to beta unless anyone objects |
Automatic merge from submit-queue Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets This is the draft proposal for #260.
Automatic merge from submit-queue Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets This is the draft proposal for kubernetes#260.
Change command setprefixname to setnameprefix
Bug fix where nil IP (byte slice) was dereferenced and caused the goroutine to hang. Fixes kubernetes#260 and kubernetes#267
This is a first step towards removing the mock CSI driver completely from e2e testing in favor of hostpath plugin. With the recent hostpath plugin changes(PR kubernetes#260, kubernetes#269), it supports all the features supported by the mock csi driver. Using hostpath-plugin for testing also covers CSI persistent feature usecases.
This is a first step towards removing the mock CSI driver completely from e2e testing in favor of hostpath plugin. With the recent hostpath plugin changes(PR kubernetes#260, kubernetes#269), it supports all the features supported by the mock csi driver. Using hostpath-plugin for testing also covers CSI persistent feature usecases.
Update embargo doc link in SECURITY_CONTACTS and change PST to PSC
@smarterclayton raised this issue in #199: how should Kubernetes support non-load-balanced and/or stateful services? Specifically, Zookeeper was the example.
Zookeeper (or etcd) exhibits 3 common problems:
And it enables master election for other replicated services, which typically share the same problems, and probably need to advertise the elected master to clients.
The text was updated successfully, but these errors were encountered: