Proposal: Isolate kubelet from etcd #860

smarterclayton · 2014-08-12T01:24:52Z

Discusses the current security risks posed by the kubelet->etcd pattern
and discusses some options.

Triggered by #846 and referenced in #859

Does not have to be merged, for discussion and review

brendandburns · 2014-08-12T20:09:07Z

docs/proposals/isolate_minions_from_etcd.md

+
+## Preferred solution:
+
+Implement the first parts of option 3 - an efficient watch API for the pod, service, and endpoints data for the Kubelet and Kube Proxy.  Authorization and authentication are planned in the future - when a solution is available, implement a custom authorization scope that allows API access to be restricted to only the data about a single minion or the service endpoint data.  Replace the event publishing mechanism in the kubelet with a polling mechanism or a simple API endpoint and guard it similarly to the other minion specific requests, and ensure the data is correctly attributed to the source.  Make the apiserver stateless - this is already a desirable outcome.


I'm in favor of this approach well. @jbeda and I were just suggesting migrating to watch on top of the API server today in a different context.

I'm in favor of the result. I can't help but feel that etcd will need to solve this ANYWAY and the idea that we're both going to arrive at very similar results is somewhat annoying. I mean, etcd is an API, just a bit lower level. Can that be fixed with less net work?

smarterclayton · 2014-08-13T14:23:15Z

@erictune @thockin any comments down this road?

thockin · 2014-08-23T20:02:30Z

docs/proposals/isolate_minions_from_etcd.md

+    3. Implement efficient "watch for changes over HTTP" to offer comparable function with etcd
+    4. Ensure that the apiserver can scale at or above the capacity of the etcd system.
+    5. Implement authorization scoping for the minions that limits the data they can view
+


Teach etcd how to do identity and access-control directly.

smarterclayton · 2014-08-23T21:32:43Z

I've updated with option 4, teach etcd how to do identity and acl. Of note is that it would be a different identity and acl solution than the apiserver, which means that auth(z|n) would have to be managed differently for those calls. It's also less advantageous for isolating the details of the data store from the implementation of the clients. I suspect option 3 is going to be useful in general for other people anyway since it allows apiserver clients to watch for changes to pods on minions for accounting / monitoring / service discovery purposes.

smarterclayton · 2014-09-03T15:56:51Z

Is there any further feedback on this proposal? @lavalamp and I have sorted through most of the remaining issues for Pods, and before that final push I want agreement on the pattern.

smarterclayton · 2014-09-03T16:14:27Z

One open question to @thockin - should what the kubelet "sees" of a pod be fundamentally different than what the client sets for a pod? We discussed in the past having two different ways of looking at pods, but then in other conversations I keep seeing things that are relevant for the kubelet to care about (labels, resourceVersion, id, etc). Assuming this is accepted, perhaps we should do a quick proposal on what the kubelet -> apiserver interface should be and get agreement on that as well prior to me closing out my issue.

smarterclayton · 2014-09-03T22:31:50Z

Just going to add this rather than wait for comments... :)

Modeling the interface that the Kubelet would see to determine the block of pods scheduled for it:

The Kubelet needs the Manifest and the pod ID at a minimum, but probably would be better suited by seeing the pod template
There may be data that the Kubelet should not see about the pod
The set of pods that are scheduled on a minion is transactional - it must be possible to see them as a single unit, and it should be impossible for a kubelet to see a set of pods that have a constraint violation (meaning changes have to be delivered in order)
It must be efficient to retrieve and watch those pods

There are two primary ways to model that:

Mimic the existing etcd model of a ContainerManifestList

# returns an api.PodList or similar, with an atomic version stamp as ResourceVersion
GET /minionPods/<host>
# returns an api.PodList or similar, with an atomic version stamp as ResourceVersion
GET /watch/minionPods/<host>

Filter the pod list by the CurrentState.Host and watch on that same filter

# returns an api.PodList with an atomic version stamp, but not all fields are set
GET /pods?field=CurrentState.Host%3D<host>
# returns events with singular pods
GET /watch/pods?field=CurrentState.Host%3D<host>

The subtle difference between the two is that under the covers we are modeling the scheduled pods using a single atomic key so that we can impose constraints atomically. I think that's an important distinction - we guarantee under the covers that you see an atomic set of pods at all times. The latter proposal therefore has to convert an atomic list into a stream of update notifications, which requires it to keep in memory the previous state and do a delta between them. The latter also assumes that the Pods in the API are the same as the pods that are scheduled, and if there is every any delta between the two, then the Pods API would return different values for different states.

After having tried both implementations, I lean much more strongly towards the former. It allows the pods on the minion to be clearly versioned and watched (watch returns a PodList or similar, rather than. It more directly exposes the fact that scheduled pods are an atomic block, and matches the underlying model more closely. It also allows the API of /minionPods to vary from /pods and to be API versioned differently.

thockin · 2014-09-04T04:59:59Z

Reading now, sorry this one got lost.

On Wed, Sep 3, 2014 at 9:14 AM, Clayton Coleman notifications@github.com
wrote:

One open question to @thockin https://github.com/thockin - should what
the kubelet "sees" of a pod be fundamentally different than what the client
sets for a pod? We discussed in the past having two different ways of
looking at pods, but then in other conversations I keep seeing things that
are relevant for the kubelet to care about (labels, resourceVersion, id,
etc). Assuming this is accepted, perhaps we should do a quick proposal on
what the kubelet -> apiserver interface should be and get agreement on
that as well prior to me closing out my issue.

Reply to this email directly or view it on GitHub
#860 (comment)
.

thockin · 2014-09-04T07:33:24Z

re: API, just a quick sketch of how my brain wants the API to be factored.

https://github.com/thockin/kubernetes/compare/api_proposal

On Wed, Sep 3, 2014 at 10:39 PM, Tim Hockin thockin@google.com wrote:

On one hand, we have asked CoreOS folks for ACLs on etcd. On the other
hand, we're proposing to abstratct them away here. I don't want to take
this decision too lightly - it really has repercussions. On the other
hand, I am a big fan of controlling my own destiny.

Option 2 seems like the sort of thing etcd folks could help with. They
want to be able to serve this role. If they gave us a null-proxy and let
us on it, that makes option 2 a lot more plausible.
Perhaps that is how we implement (prototype) ACLs.

Option 3 is clearly the most work, but the most thorough abstraction.

The biggest argument against option 4 is that it leaves Kubelets coupled
to etcd. I'm not very worried about that, really. I mean, great if we
don't, but kubelet already supports an array of input modes.

I'm OK with option 3, but I want us to think really hard about how much
work it will be vs option 2 (or a hybrid of 2 & 4 (which I think the etcd
folks would be motivated to help with). This change sets a tone for the
role of etcd in kubernetes for the long term.

To the open question about what Kubelet sees of a pod. It is my belief
that we would do well to separate the structs that we receive from a user
from the structs we store over time from the structs we publish to kubelet.
They might overlap a lot or contain one another for convenience, but I
think they are different concepts that we model poorly today. I think this
can/should be fixed regardless of which approach we take on this proposal.

For example, I find it weird and annoying that creating a pod requires me
to say "desiredState". Of course it is the desired state. I find it weird
and annoying that we have this structure of Pod { PodState { Manifest } },
but we only show Manifest to Kubelet, which does not have proper ID or name.

I'd much rather show Kubelet a BoundPod structure, which looks a whole lot
like a Pod with some fields added and some removed.

I'd much rather a user POST a PodSpec structure, which looks like a
Manifest with some extra fields, and internally convert it to a PodState,
which was a superset of PodSpec. I am not sure what REST rules say about
that.

I can get more concrete if needed, I guess.

On Wed, Sep 3, 2014 at 9:59 PM, Tim Hockin thockin@google.com wrote:

Reading now, sorry this one got lost.

On Wed, Sep 3, 2014 at 9:14 AM, Clayton Coleman <notifications@github.com

wrote:

One open question to @thockin https://github.com/thockin - should
what the kubelet "sees" of a pod be fundamentally different than what the
client sets for a pod? We discussed in the past having two different ways
of looking at pods, but then in other conversations I keep seeing things
that are relevant for the kubelet to care about (labels, resourceVersion,
id, etc). Assuming this is accepted, perhaps we should do a quick proposal
on what the kubelet -> apiserver interface should be and get
agreement on that as well prior to me closing out my issue.

Reply to this email directly or view it on GitHub
#860 (comment)
.

smarterclayton · 2014-09-04T14:05:10Z

Can't comment directly, doing it here

https://github.com/thockin/kubernetes/compare/api_proposal?diff=unified#diff-9ff06ad723720f9428a65da3710cf436R60

This seems like PodTemplate

https://github.com/thockin/kubernetes/compare/api_proposal?diff=unified#diff-9ff06ad723720f9428a65da3710cf436R76

Agree this is a good separation.

https://github.com/thockin/kubernetes/compare/api_proposal?diff=unified#diff-9ff06ad723720f9428a65da3710cf436R129

I think I'd been leaning this way as well, in which case we're talking about a distinct resource for pods anyway and we need to clearly call it out as separate. I don't see pods being bound to multiple servers in the Kube system (disagreement?).

In general I don't disagree with an alignment like that.

bgrant0607 · 2014-09-04T20:57:33Z

See also #1178 -- API support for diff'ing, which also includes some cleanup/refactoring. It covers all the objects, not just pods, and proposes nesting of JSONBase within a field and moving labels there, among other things.

The pod template issue is #170. I agree that PodSpec looks similar. I don't get why it includes JSONBase. It should be embedded into both Pod and PodTemplate.

Why is the manifest in PodStatus?

Does BoundPod really need to be different from Pod? I suppose eventually we'll want Kubelet to return different CurrentStatus for the pod than the apiserver. If we split desired and current state into completely separate messages/types, we'd only need to fork the current state types, but I suppose that would be less RESTful.

Ideally, Kubelet would follow the same API conventions as our other APIs, however. I'd like it to be possible to target "free-range" Kubelets with the config system.

thockin · 2014-09-05T00:06:04Z

It feels like we are converging on a largish API overhaul. Who is going to be responsible for owning the process of collecting changes, weighing options, and writing up a proposal? I'll do it, but I'm afraid I am already stretched thin...

smarterclayton · 2014-09-05T02:25:46Z

I've got some bandwidth to do so. Right now I see the diff issue, name and namespace, pod -> minion, and resource version on all operations. Others?

smarterclayton · 2014-09-05T02:26:04Z

And pod template

lavalamp · 2014-09-05T18:47:18Z

If we're doing a large API overhaul, we should cut version v1beta2 right now with our current api. (so that these big changes will land in v1beta3.)

I can improve the conversion functions to add any needed functionality, right now it's not super easy to move info between hierarchical levels.

bgrant0607 · 2014-09-05T20:41:33Z

@lavalamp What changes would go into v1beta2?

@dchen1107 also urgently needs some API changes.

thockin · 2014-09-06T16:23:04Z

OK, I annoint @smarterclayton as the cat-herder for API proposals for
v1beta3. I suggest we collect a handful of short design issues and then
start to work out a new version of api/types.go, and plans for how to make
each change.

On my list, so far

refactoring pods/manifests
distinguishing the thing we give to kubelet from the thing we hold in the
API
distinguishing what we accept from users from the thing we hold in the API
names and IDs
a common pattern for "one of" plugins
clarify JSONBase wrt lists

We should set a relatively short time horizon to get ideas written up, some
window for discussion, and then done. We can not still be talking about
this come October.

On Fri, Sep 5, 2014 at 1:41 PM, bgrant0607 notifications@github.com wrote:

@lavalamp https://github.com/lavalamp What changes would go into
v1beta2?

@dchen1107 https://github.com/dchen1107 also urgently needs some API
changes.

Reply to this email directly or view it on GitHub
#860 (comment)
.

smarterclayton · 2014-09-06T16:42:35Z

Discussion on each issue started next week, with references in Kubernetes-dev. Rough concrete examples of API syntax for folks to debate next week. Identification of anything out of scope next week. Debate and refinement into week after?

kubernetes-bot · 2014-11-26T04:58:28Z

Can one of the admins verify this patch?

bgrant0607 · 2014-12-03T16:18:14Z

Could we update this doc with the current plan and put it in the design doc directory?

smarterclayton · 2014-12-08T03:09:12Z

Yes, will do.

Discusses the current security risks posed by the kubelet->etcd pattern and discusses some options. Triggered by kubernetes#846 and referenced in kubernetes#859

smarterclayton · 2015-01-04T05:22:19Z

Updated (minor edits, can be more drastic if needed).

Proposal: Isolate kubelet from etcd

Always to a live lookup of version info instead of caching.

@neolit123

Automatic merge from submit-queue (batch tested with PRs 61803, 64305, 64170, 64361, 64339). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Improve the help of kubeadm completion **What this PR does / why we need it**: Add note that 'bash-completion' is required on Linux too. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes [kubernetes/kubeadm/#860](kubernetes/kubeadm#860) **Special notes for your reviewer**: cc @neolit123 **Release note**: ```release-note NONE ```

Makefile: Cleanup, alpine and amd64 only UDP

brendandburns reviewed Aug 12, 2014
View reviewed changes

smarterclayton mentioned this pull request Aug 20, 2014

Allow Kubelet to watch for changes to pods scheduled on it #846

Closed

thockin reviewed Aug 23, 2014
View reviewed changes

smarterclayton force-pushed the isolate_kubelet_from_etcd branch from 1874549 to e54e076 Compare August 23, 2014 21:30

bgrant0607 mentioned this pull request Sep 4, 2014

Inconsistent usage of ID vs Name #1135

Closed

smarterclayton mentioned this pull request Sep 9, 2014

Proposal: v1beta3 API overhaul #1225

Merged

bgrant0607 added kind/design Categorizes issue or PR as related to design. kind/documentation Categorizes issue or PR as related to documentation. labels Sep 10, 2014

jbeda mentioned this pull request Sep 17, 2014

Add support for using client certificates with etcd #1338

Closed

bgrant0607 added this to the v0.9 milestone Oct 4, 2014

bgrant0607 added area/api Indicates an issue on api area. area/security area/etcd labels Oct 4, 2014

jdef mentioned this pull request Oct 10, 2014

issue tracking: isolate kubelet from etcd mesosphere/kubernetes-mesos#38

Closed

jbeda force-pushed the master branch from 89ee618 to f61d434 Compare October 16, 2014 23:44

erictune mentioned this pull request Nov 26, 2014

Kubelet to understand pods, and to be able to pull from apiserver #2483

Closed

bgrant0607 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 3, 2014

bgrant0607 assigned erictune Dec 3, 2014

Proposal: Isolate kubelet from etcd

99977ce

Discusses the current security risks posed by the kubelet->etcd pattern and discusses some options. Triggered by kubernetes#846 and referenced in kubernetes#859

smarterclayton force-pushed the isolate_kubelet_from_etcd branch from e54e076 to 99977ce Compare January 4, 2015 05:21

erictune added a commit that referenced this pull request Jan 6, 2015

Merge pull request #860 from smarterclayton/isolate_kubelet_from_etcd

58abb40

Proposal: Isolate kubelet from etcd

erictune merged commit 58abb40 into kubernetes:master Jan 6, 2015

smarterclayton deleted the isolate_kubelet_from_etcd branch February 11, 2015 02:21

vishh pushed a commit to vishh/kubernetes that referenced this pull request Apr 6, 2016

Merge pull request kubernetes#860 from pwittrock/no_cache_versioninfo

39290ee

Always to a live lookup of version info instead of caching.

b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this pull request Feb 5, 2021

Merge pull request kubernetes#860 from tomdee/simplify-packaging

1e0f20e

Makefile: Cleanup, alpine and amd64 only UDP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Isolate kubelet from etcd #860

Proposal: Isolate kubelet from etcd #860

smarterclayton commented Aug 12, 2014

brendandburns Aug 12, 2014

thockin Aug 23, 2014

smarterclayton commented Aug 13, 2014

thockin Aug 23, 2014

smarterclayton commented Aug 23, 2014

smarterclayton commented Sep 3, 2014

smarterclayton commented Sep 3, 2014

smarterclayton commented Sep 3, 2014

thockin commented Sep 4, 2014

thockin commented Sep 4, 2014

smarterclayton commented Sep 4, 2014

bgrant0607 commented Sep 4, 2014

thockin commented Sep 5, 2014

smarterclayton commented Sep 5, 2014

smarterclayton commented Sep 5, 2014

lavalamp commented Sep 5, 2014

bgrant0607 commented Sep 5, 2014

thockin commented Sep 6, 2014

smarterclayton commented Sep 6, 2014

kubernetes-bot commented Nov 26, 2014

bgrant0607 commented Dec 3, 2014

smarterclayton commented Dec 8, 2014

smarterclayton commented Jan 4, 2015


		## Preferred solution:

		Implement the first parts of option 3 - an efficient watch API for the pod, service, and endpoints data for the Kubelet and Kube Proxy. Authorization and authentication are planned in the future - when a solution is available, implement a custom authorization scope that allows API access to be restricted to only the data about a single minion or the service endpoint data. Replace the event publishing mechanism in the kubelet with a polling mechanism or a simple API endpoint and guard it similarly to the other minion specific requests, and ensure the data is correctly attributed to the source. Make the apiserver stateless - this is already a desirable outcome.

Proposal: Isolate kubelet from etcd #860

Proposal: Isolate kubelet from etcd #860

Conversation

smarterclayton commented Aug 12, 2014

brendandburns Aug 12, 2014

Choose a reason for hiding this comment

thockin Aug 23, 2014

Choose a reason for hiding this comment

smarterclayton commented Aug 13, 2014

thockin Aug 23, 2014

Choose a reason for hiding this comment

smarterclayton commented Aug 23, 2014

smarterclayton commented Sep 3, 2014

smarterclayton commented Sep 3, 2014

smarterclayton commented Sep 3, 2014

thockin commented Sep 4, 2014

thockin commented Sep 4, 2014

smarterclayton commented Sep 4, 2014

bgrant0607 commented Sep 4, 2014

thockin commented Sep 5, 2014

smarterclayton commented Sep 5, 2014

smarterclayton commented Sep 5, 2014

lavalamp commented Sep 5, 2014

bgrant0607 commented Sep 5, 2014

thockin commented Sep 6, 2014

smarterclayton commented Sep 6, 2014

kubernetes-bot commented Nov 26, 2014

bgrant0607 commented Dec 3, 2014

smarterclayton commented Dec 8, 2014

smarterclayton commented Jan 4, 2015