Kill the store #32

mikekap · 2016-11-14T05:23:33Z

Unfortunately there are still bugs in the store w.r.t. ordering. I can't figure out how to track them down anymore :(.

Instead just get rid of the store and use NewIndexerInformer to create a custom pod ip -> pod index. Also only index pods that are "active" - i.e. not succeeded or failed.

Use cache indexers

jtblin

I'm not familiar with the IndexerInformer API and am not sure what problem you are trying to solve. Do you have more info e.g. error messages, symptoms, etc. about the issues you've encountered?

In the end, this starts to diverge quite a bit from the kubernetes DNS implementation e.g. https://github.com/kubernetes/kubernetes/blob/master/pkg/dns/dns.go#L217 that this code is originally based on and might introduce side effects that I am unable to really grasp being unfamiliar with what the IndexerInformer API does.

jtblin · 2016-11-27T22:43:29Z

cmd/k8s.go

 type k8s struct {
 	*client.Client
+	podStore      kcache.Indexer


Technically, it's an Indexer not a store. Indexers contain store, so let's rename that to podIndexer.

jtblin · 2016-11-28T03:09:29Z

cmd/server.go

-	err = backoff.Retry(operation, backoff.NewExponentialBackOff())
-	if err != nil {
-		return "", err
+	exponential := backoff.NewExponentialBackOff()


Good idea 👍

Wonder if we want to try for more than one second and if we should set the number of retries as well though.

jtblin · 2016-12-04T21:54:23Z

cmd/k8s.go

+	for i, pod := range pods {
+		podNames[i] = pod.(*api.Pod).ObjectMeta.Name
+	}
+	return nil, fmt.Errorf("%d pods (%v) with the ip %s indexed", len(pods), podNames, IP)


Not sure how indexer works exactly, but the code doesn't seem to do anything when a pod is removed so wouldn't there be a case old pods have been indexed at a specific IP and the IP has been recycled for a new pod? In that case, the code would return an error here.

The indexer takes care of calling this on every add/update (on both new & old pods)/remove and keeping the internal mapping up-to-date.

mikekap · 2016-12-12T03:24:31Z

Unfortunately, I was still seeing mismaps (i.e. pods getting credentials from other pods or pods not getting their own credentials) when using the manual add/remove code. Indexer can maintain an index the same way we try to in store.go, so I switched to using that. Since I deployed this to our cluster, we haven't seen a single mismap or missing IAM mapping.

Definitely agree that it would be awesome to keep this the same as kube-dns. Looking at the code that kube-dns uses, it's likely there are race conditions in that code as well. However, folks are unlikely to ever experience them since kube-dns only watches for Service object updates - these objects almost never change. When pods are added/removed Endpoint objects change, and kube-dns only uses those for headless services - those without a cluster ip. I've run into a very similar issue with headless services at kubernetes/kubernetes#34590 .

kube-proxy is the one that has an update loop that watches for pod updates for a service ip. The logic for keeping track of updates is at https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/config/config.go#L125 , but interestingly ADD/REMOVE are never used - the underlying watcher only ever sends SET (i.e. "current state of the world"): https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/config/api.go#L45 . You can see that in the iptables-based implementation: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L578 assumes all the endpoints are passed every time.

kube-controller-manager takes a very similar approach to updating Endpoint objects for services. Whenever a pod gets added or removed, the entire endpoint list for a service is recomputed. In net, I wouldn't trust add/update/remove callbacks for anything other than notifications. We should just resync the "state of the world" on every callback.

When implementing that exact scheme here, I found that Indexer does exactly that internally. I used that instead. I'm not 100% sure why e.g. kube dns doesn't use Indexer, but it works really well for kube2iam's use case.

negz · 2017-02-27T03:10:54Z

Am I correct in my reading of the current store implementation that each Kube2IAM instance will only be able to grant roles for pods it has seen created or updated during its lifetime? If so, this PR seems like a better implementation - will it be merged?

jtblin · 2017-02-27T04:35:38Z

@negz

Am I correct in my reading of the current store implementation that each Kube2IAM instance will only be able to grant roles for pods it has seen created or updated during its lifetime?

No, on startup the Informer API will send all the pods that have been created. It works exactly the same as the current kube dns implementation.

jrnt30 · 2017-08-07T15:52:02Z

@mikekap I'm curious if you are still running this or if you have opted for a different solution. I would be willing to take a look at the branch conflicts and submit a PR if it would be considered by @jtblin but the credential caching for jobs in the Completed state from the Jobs api has become a blocker for us that needs to be addressed

mikekap · 2017-08-07T17:16:29Z

@jrnt30 we're still running it. The Completed issue was one thing this fixed, but we were running into sporadic issues with pods that couldn't fetch credentials or would fetch credentials from other pods.

jtblin · 2017-08-10T04:48:09Z

@jrnt30 would be totally happy to consider an updated PR. I actually have that in my todos for a while but have been pretty busy with different things lately.

Additional complexity is that there is now support for namespaces as well but moving to IndexerInformer definitely seems to be the way to go for kube2iam.

jrnt30 · 2017-08-10T12:06:38Z

Thanks for the reply. I started working on this yesterday and actually found that as well. Knowing the PR will be considered is very helpful, I'll see if I can come up with something.

jtblin · 2017-09-04T07:24:30Z

Deprecated in favour of #92. Thanks!

mikekap added 4 commits October 24, 2016 22:24

Add a ton more and autofixing to attempt to fix occasional iam mismaps

3330e85

Handle pods marked Completed/Failed

115ba5c

Fix deadlock

1c0f124

Ok that's it; kill the store

1cd6167

Use cache indexers

mikkeloscar mentioned this pull request Nov 27, 2016

Don't hang forever if there is no role defined for a pod #28

Closed

jtblin reviewed Dec 4, 2016

View reviewed changes

smelchior mentioned this pull request Jan 31, 2017

Pods are sometimes assigned to the incorrect IAM role #46

Closed

jtblin force-pushed the master branch from 2adfe6d to 307c1bb Compare June 14, 2017 09:52

bismarck mentioned this pull request Jun 23, 2017

default-role race condition #80

Open

jtblin force-pushed the master branch 3 times, most recently from 1581bdb to 12fa75c Compare June 24, 2017 13:12

jrnt30 mentioned this pull request Aug 12, 2017

Updating to use IndexInformer #92

Merged

jtblin closed this Sep 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kill the store #32

Kill the store #32

mikekap commented Nov 14, 2016 •

edited

Loading

jtblin left a comment

jtblin Nov 27, 2016

jtblin Nov 28, 2016

jtblin Dec 4, 2016

mikekap Dec 12, 2016

mikekap commented Dec 12, 2016

negz commented Feb 27, 2017

jtblin commented Feb 27, 2017 •

edited

Loading

jrnt30 commented Aug 7, 2017

mikekap commented Aug 7, 2017

jtblin commented Aug 10, 2017

jrnt30 commented Aug 10, 2017

jtblin commented Sep 4, 2017

Kill the store #32

Kill the store #32

Conversation

mikekap commented Nov 14, 2016 • edited Loading

jtblin left a comment

Choose a reason for hiding this comment

jtblin Nov 27, 2016

Choose a reason for hiding this comment

jtblin Nov 28, 2016

Choose a reason for hiding this comment

jtblin Dec 4, 2016

Choose a reason for hiding this comment

mikekap Dec 12, 2016

Choose a reason for hiding this comment

mikekap commented Dec 12, 2016

negz commented Feb 27, 2017

jtblin commented Feb 27, 2017 • edited Loading

jrnt30 commented Aug 7, 2017

mikekap commented Aug 7, 2017

jtblin commented Aug 10, 2017

jrnt30 commented Aug 10, 2017

jtblin commented Sep 4, 2017

mikekap commented Nov 14, 2016 •

edited

Loading

jtblin commented Feb 27, 2017 •

edited

Loading