Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing http timeout for ec2metadata to 30 seconds #311

Conversation

mschurenko
Copy link

Problem

I would like to have AWS IAM credentials fetched from the EC2 metadata service rather than provided by way of a file or environment variables.

See: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

In order to achieve this we are using kube2iam. The problem we have observed is that the default timeout to ec2metadata in the golang sdk is quite short. (I believe it's 1 second).

This causes issues where ark fails to retrieve access keys from kube2iam with the following error in the ark controller log:

time="2018-01-16T19:04:06Z" level=error msg="error listing backups" error="rpc error: code = Unknown desc = AccessDenied: Access Denied\n\tstatus code: 403, request id: D8973E7C07118308, host id: xW8x2FdgzNtzSDhs+unUdpyE/mQ4MiAg1KbrWTk8E279gza3uAjQU+WU6eZFl9Tz/j00Gw+e8KQ=" logSource="pkg/controller/backup_sync_controller.go:80"

This is a known issue with kube2iam.

The solution appears to be to increase the timeout in the client SDK. The Python SDK (boto) supports a AWS_METADATA_SERVICE_TIMEOUT environment variable which could be used to solve the problem. Unfortunately, it seems that the golang SDK provides no such mechanism.

I know this is not an issue specifically with ark but since many people use kube2iam for temporary IAM credentials it is likely they will encounter this same issue.

Solution

This PR increases the ec2metadata client http timeout to 30 seconds which solves the problem.

If there is a better/more simple way to increase the timeout than what I've done in the PR that would be great.

I had to override the entire IAM credential provider chain just to change the defaults for the EC2RoleProvider.

Also, setting the timeout to 30 seconds was pretty arbitrary. I'm not sure why it would be an issue to set it this high (or even higher) as I have it set to the last provider in the credential chain.

Signed-off-by: Matt Schurenko matt.schurenko@gmail.com

Signed-off-by: Matt Schurenko <matt.schurenko@gmail.com>
},
},
)
config.WithCredentials(creds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this as-is because it overrides the default credentials loading order that we get from https://github.com/aws/aws-sdk-go/blob/fb9d53b0db7e801eb0d4fa021f5860794d845da3/aws/session/session.go#L416-L469. But I do think we can come up with a solution. We can either copy the aws session logic and modify it as needed, or (if we're ok changing the timeout for all ec2 calls), I think you could just set config.HTTPClient and it should trickle down to the ec2 role provider's metadata client.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're ok with a 30 second timeout across the board then adding config.WithHTTPClient to getSession seems to work:

func getSession(config *aws.Config) (*session.Session, error) {
        config.WithHTTPClient(&http.Client{Timeout: 30 * time.Second})
	sess, err := session.NewSession(config)
	if err != nil {
		return nil, errors.WithStack(err)
	}

	if _, err := sess.Config.Credentials.Get(); err != nil {
		return nil, errors.WithStack(err)
	}

	return sess, nil
}

@ncdc
Copy link
Contributor

ncdc commented Feb 12, 2018

@mschurenko thanks for your PR! We're wrapping up the v0.7.0 release later this week, so we can look to get this in after then. Please see my comment about a behavioral change - let's discuss here the best way to get you what you need.

@mschurenko
Copy link
Author

@ncdc thanks for the quick response. It would be great to get this into the next release if possible. Please see my reply to your review.

@mschurenko
Copy link
Author

@ncdc as it turns out the problem was not solved by increasing the http.Client timeout (it was just a coincidence that it worked during my testing).

The problem appears to be a race condition with ark and kube2iam where ark fetches the credentials before kube2iam has discovered the pod. If I create an init container that sleeps for a few seconds then the problem goes away.

I'm not sure if we should try and fix this in ark so that it expires the credential cache if it gets an AccessDenied? It is a bit odd behaviour as ark is getting a valid access key from kube2iam, it is just the wrong access key.

Also let me know if I should just close this PR and open an issue. Increasing the http.Client timeout does work but, for us, it was not the actual cause of the problem.

@ncdc
Copy link
Contributor

ncdc commented Feb 15, 2018

@mschurenko since this isn't solving the problem, how about we close this PR and move it to an issue? Thanks!

@mschurenko mschurenko closed this Feb 15, 2018
@mschurenko
Copy link
Author

Closing PR and will open an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants