Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating to use IndexInformer #92

Merged
merged 9 commits into from
Sep 17, 2017
Merged

Conversation

jrnt30
Copy link
Collaborator

@jrnt30 jrnt30 commented Aug 12, 2017

NOTE: This is the start of a rebase/rework for #32. There ended up being some significant changes that made it difficult to integrate the changes easily and I ended up moving around a bit of the logic from store and server in an attempt to make it a bit more reusable and testable.
Very open to feedback but wanted to get some eyes on this from @jtblin and @mikekap if they have time to ensure I'm heading down the right path.

I have done some initial testing in our staging cluster but need to spend some more time next week going through some of the edge cases on in server/ and test some cross-account roles tested

  • Removed reliance on the store for maintain state and
    migrated to the IndexInformer. This now will only cache
    "active" pods (no failed/completed pods will be returned)
  • Extracted role based fallback logic to it's own processor
    and removed from the Server/Event handlers
  • Migrated event handlers to the k8s package
  • Added in processor tests for the new RoleProcessor

- Removed reliance on the store for maintain state and
migrated to the IndexInformer.  This now will only cache
"active" pods (no failed/completed pods will be returned)
- Extracted role based fallback logic to it's own processor
and removed from the Server/Event handlers
- Migrated event handlers to the k8s package
- Moved the `cmd` package back to the base to allow for easy builds
and acknowlding the fact that the project itself is the unit of
deployment and it's unlikely this would be used as a library in and of
itself
- Added in processor tests for the new RoleProcessor
@coveralls
Copy link

Coverage Status

Coverage increased (+18.3%) to 28.205% when pulling 83e2771 on jrnt30:index-refactor into 12fa75c on jtblin:master.

@jtblin
Copy link
Owner

jtblin commented Aug 17, 2017

Thanks @jrnt30, can you please revert this:

Moved the cmd package back to the base to allow for easy builds
and acknowledging the fact that the project itself is the unit of
deployment and it's unlikely this would be used as a library in and of
itself

I will start reviewing when this is done. It's the "standard" directory structure for go projects and I would like to keep it as-is. Moreover it creates unnecessary noise in the PR making it harder to review.

Justin Nauman added 2 commits August 17, 2017 05:24
@jrnt30
Copy link
Collaborator Author

jrnt30 commented Aug 17, 2017

@jtblin Sure thing, that is done and conflicts merged with master.

Copy link
Owner

@jtblin jtblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Just a few nitpicks for now, I will be travelling so will continue the review during the weekend. Also it seems there are some tests failing in the build.

k8s/k8s.go Outdated
@@ -3,6 +3,8 @@ package k8s
import (
"time"

"fmt"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put before "time" dep.

server/server.go Outdated
"github.com/jtblin/kube2iam/iam"
"github.com/jtblin/kube2iam/k8s"
"github.com/jtblin/kube2iam/store"
"github.com/jtblin/kube2iam/processor"
"k8s.io/client-go/tools/cache"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the k8s.io dep to the other external deps i.e. after log "github.com/sirupsen/logrus".

package processor

import (
"fmt"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep deps orderered:

  1. Std lib e.g. fmt, encoding/json
  2. external deps
  3. kube2iam deps

@jrnt30
Copy link
Collaborator Author

jrnt30 commented Aug 18, 2017

Thanks for the feedback. The imports structure makes a lot of sense, those are cleaned up.

The tests themselves were an oversight when I moved main back into the cmd/ package but didn't change the Coveralls configuration back.

That's been fixed as well.

@coveralls
Copy link

Coverage Status

Coverage increased (+18.3%) to 28.205% when pulling bedae78 on jrnt30:index-refactor into e9db60b on jtblin:master.

Copy link
Owner

@jtblin jtblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. Few more changes requested, it's a huge PR so it's expected, almost there. Good work overall! 👍

cover.sh Outdated
@@ -11,13 +11,12 @@ echo "mode: count" > coverage.out
# Initialize error tracking
ERROR=""

declare -a packages=('' \
'cmd' \
declare -a packages=('cmd' \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that precents root files to be tested. Please follow the original usage and add "processor" after "k8s".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are no *.go files now, Coveralls will complain if the '' is left in. Moved the other to be sorted.

Makefile Outdated
@@ -26,10 +26,10 @@ setup:
gometalinter --install --update
glide install --strip-vendor

build: *.go fmt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing that ooc? Was that creating any issue for you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we removed the root *.go files it didn't trigger properly.

k8s/k8s.go Outdated
@@ -13,12 +14,18 @@ import (

const (
// Resync period for the kube controller loop.
resyncPeriod = 30 * time.Minute
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please order alphabetically.

k8s/k8s.go Outdated
)

// Client represents a kubernetes client.
type Client struct {
*kubernetes.Clientset
podIndexer cache.Indexer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please order alphabetically. It makes finding things easier.

 	namespaceController *cache.Controller
 	namespaceIndexer    cache.Indexer
 	podController       *cache.Controller
 	podIndexer          cache.Indexer

k8s/k8s.go Outdated
@@ -75,5 +137,7 @@ func NewClient(host, token string, insecure bool) (*Client, error) {
if err != nil {
return nil, err
}
return &Client{client}, nil
k8s := &Client{}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplify to: return &Client{Clientset: client}, nil

ar := GetNamespaceRoleAnnotation(ns, r.namespaceKey)
for _, role := range ar {
if r.iam.RoleARN(role) == roleArn {
log.Debugf("Role:%s on namespace:%s found.", roleArn, namespace)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space after :

return true
}
}
log.Warnf("Role: [%s] on namespace: [%s] not found.", roleArn, namespace)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the [] are unnecessary and better use fields for structured logging. See *Fields convenience methods in other places.

@@ -0,0 +1,241 @@
package processor
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding all these tests! 👍

There's no test for GetRoleMapping, maybe not strictly necessary though as you're testing the other methods but if not too painful could be nice.

server/server.go Outdated
}

if !synced {
log.Fatalf("Attempted to wait for caches to be synced for [%d] however it is not done. Giving up.", defaultCacheSyncAttempts)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit ... for %d attempts, however ...

server/server.go Outdated

synced := false
for i := 0; i < defaultCacheSyncAttempts && !synced; i++ {
synced = cache.WaitForCacheSync(nil, podSynched, namespaceSynched)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that should prevent the race condition for default role? 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more around ensuring that both the pods and namespaces were indexed to ensure that you couldn't "sneak in" a role. This only happens @ startup time, so there is the possibility that you could get the default role immediately if the new pod hasn't been indexed.

What this PR aims to solve is a permanent caching of those incorrect roles.

@dadux
Copy link

dadux commented Aug 23, 2017

We've tried running of this branch in our dev environments, but still get a lot of "default role" under load (our build-engineering team spinning up 100s of concurrent jobs)

@jtblin
Copy link
Owner

jtblin commented Aug 23, 2017

Thanks for the info @dadux, that's annoying. Looking at the code, it only falls back to the default role if the pod is found by its ip, there is no annotation, and there is a default role set. So if it's getting the default role when it shouldn't, it can only be because the cache has not been updated yet.

The solutions I can think of are:

  1. Make a synchronous API call in case of fallback to ensure an annotation hasn't been set. There would be a latency impact obviously and a new code path to manage.
  2. Move the fallback to the default role down the chain so that we return an error from extractRoleARN. That will trigger the exponential backoff operation retry and should be able to catch the updated annotation. Latency impact as well but at least no need for additional synchronous API call.
  3. Remove the default role option as it is not reliable.

Anyhow let's not try to tackle that in this PR for now. We can look into it after.

@jrnt30
Copy link
Collaborator Author

jrnt30 commented Aug 23, 2017

@jtblin Thanks for the guidance and constructive feedback, it's very helpful. I have addressed a majority of these issues. The processor renaming I have left as is for now and will come up with something better.

@coveralls
Copy link

Coverage Status

Coverage increased (+8.3%) to 18.212% when pulling 12a6bcc on jrnt30:index-refactor into e9db60b on jtblin:master.

Copy link
Owner

@jtblin jtblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got distracted. I think it's the last nit apart from package naming. I'd suggest just going with "mappings" for now or something like that.
I was also waiting for @dadux to confirm that everything was looking good with the binary built from this branch.

Makefile Outdated
@@ -29,7 +29,7 @@ setup:
build: *.go fmt
go build -o build/bin/$(ARCH)/$(BINARY_NAME) $(GOBUILD_VERSION_ARGS) github.com/jtblin/$(BINARY_NAME)/cmd

build-race: *.go fmt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change this one back as well now.

namespaceKey string
namespaceRestriction bool
iam *iam.Client
kubeStore store
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant changing the variable name from kubeStore to store in the struct as well pls.

@jrnt30
Copy link
Collaborator Author

jrnt30 commented Aug 31, 2017

I also know that @placydo was looking at it a bit. Are your experiencing ok so far?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c8e937a on jrnt30:index-refactor into ** on jtblin:master**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 775c7cf on jrnt30:index-refactor into ** on jtblin:master**.

@jtblin
Copy link
Owner

jtblin commented Sep 1, 2017

I've pushed docker image jtblin/kube2iam:dev for convenience if someone wants to try in its cluster. Would really appreciate some feedback given the size of the PR.

@placydo
Copy link

placydo commented Sep 1, 2017

Hey. Currently I am in middle of something. I will try to pull this image today and give it a try + test it over the weekend. Will come back with feedback asap. Thanks guyz!

@jtblin
Copy link
Owner

jtblin commented Sep 4, 2017

@struz did you guys see any issue running this on our cluster?

@jtblin jtblin changed the title WIP: Updating to use IndexInformer Updating to use IndexInformer Sep 4, 2017
@placydo
Copy link

placydo commented Sep 11, 2017

@jtblin @jrnt30 We have not seen this issue anymore. Everything works as expected!

return false
}

// DumpDebugInfo outputs all the roles by IP addresr.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/addresr/address/

@jtblin jtblin merged commit c5ef1ec into jtblin:master Sep 17, 2017
@jtblin
Copy link
Owner

jtblin commented Sep 18, 2017

Thanks a ton again @jrnt30 for the great work. This is now merged and released as 0.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants