Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hang forever if there is no role defined for a pod #28

Closed
mikkeloscar opened this issue Nov 3, 2016 · 4 comments
Closed

Don't hang forever if there is no role defined for a pod #28

mikkeloscar opened this issue Nov 3, 2016 · 4 comments

Comments

@mikkeloscar
Copy link
Contributor

Currently a call to metadata from a pod without a role will hang forever because it retries to get the role from the IP indefinitely: https://github.com/jtblin/kube2iam/blob/master/cmd/server.go#L61-L66.

If the retry really is needed I think it would be better to limit it to X number of retries and then respond. Hanging the connection forever is not nice for the calling application.

I know this could be solved by setting a default role, but it should also work when no default role is defined IMO.

I wouldn't mind making a PR fixing this, but I would like your (@jtblin) opinion before I start, if you don't mind.

  • Do we really need to retry or would it be ok to respond with a 404, this obviously gives a false positive if kube2iam just is too slow to recognize the pod annotation?
  • If we need to retry, can we limit it to a finite number of retries?
@mikkeloscar
Copy link
Contributor Author

mikkeloscar commented Nov 3, 2016

It just occurred to me that it could also be solved by simply asking the API server if the pod has a role annotation in case it's not already in the roleByIP table. What do you think about that?

@jtblin
Copy link
Owner

jtblin commented Nov 20, 2016

Currently a call to metadata from a pod without a role will hang forever because it retries to get the role from the IP indefinitely: https://github.com/jtblin/kube2iam/blob/master/cmd/server.go#L61-L66.

It won't hang forever but the defaults of the backoff algorithm is pretty long (15'): https://godoc.org/github.com/cenkalti/backoff#pkg-examples

If the retry really is needed I think it would be better to limit it to X number of retries and then respond. Hanging the connection forever is not nice for the calling application.

The retry is there to avoid a race condition in case we haven't got the role data yet. I agree that the defaults are too long.

It just occurred to me that it could also be solved by simply asking the API server if the pod has a role annotation in case it's not already in the roleByIP table. What do you think about that?

I would prefer that we change the values for the retry to a few seconds max rather than adding an API call. It's easy to do by configuring the backoff returned by backoff.NewExponentialBackOff().

@mikkeloscar
Copy link
Contributor Author

Hey, thanks for the feedback. Did you check any of the PRs (#30,#32). It seems like the try to solve related issues in different ways. If you plan to merge any of those then I'll wait with a fix for this. Otherwise I'll make a PR just reducing the Backoff limit like you suggest.

@jtblin
Copy link
Owner

jtblin commented Nov 27, 2016

I don't think #30 is an acceptable solution, entered my comments on the issue. I am not sure which problem #32 is trying to solve.

mikkeloscar added a commit to mikkeloscar/kube2iam that referenced this issue Nov 29, 2016
This makes the max interval and max elapsed time configurabel for the
exponential backoff used when getting a role based on source IP.

The defaults are still the same e.g. 1 minute for MaxInterval and 15
minutes for the MaxElapsedTime.

Fix jtblin#28
@jtblin jtblin closed this as completed in #33 Dec 4, 2016
jtblin pushed a commit that referenced this issue Dec 4, 2016
This makes the max interval and max elapsed time configurabe for the
exponential backoff used when getting a role based on source IP.

The defaults are still the same e.g. 1 minute for MaxInterval and 15
minutes for the MaxElapsedTime.

Fix #28

* Use sane max interval & max elapsed time defaults
negz pushed a commit to negz/kube2iam that referenced this issue Feb 26, 2017
This makes the max interval and max elapsed time configurabe for the
exponential backoff used when getting a role based on source IP.

The defaults are still the same e.g. 1 minute for MaxInterval and 15
minutes for the MaxElapsedTime.

Fix jtblin#28

* Use sane max interval & max elapsed time defaults
negz pushed a commit to negz/kube2iam that referenced this issue Feb 26, 2017
This makes the max interval and max elapsed time configurabe for the
exponential backoff used when getting a role based on source IP.

The defaults are still the same e.g. 1 minute for MaxInterval and 15
minutes for the MaxElapsedTime.

Fix jtblin#28

* Use sane max interval & max elapsed time defaults
negz pushed a commit to negz/kube2iam that referenced this issue Feb 26, 2017
This makes the max interval and max elapsed time configurabe for the
exponential backoff used when getting a role based on source IP.

The defaults are still the same e.g. 1 minute for MaxInterval and 15
minutes for the MaxElapsedTime.

Fix jtblin#28

* Use sane max interval & max elapsed time defaults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants