Get credentials from role has become slow #2972

alexd765 · 2019-11-25T14:36:37Z

Please fill out the sections below to help us address your issue.

Version of AWS SDK for Go?

v1.25.41

Version of Go (`go version`)?

go version go1.11.1 linux/amd64

What issue did you see?

Getting credentials from an ec2 role takes 20 seconds.

We have this issue on consistently on multiple different ec2 instances since a couple of days ago.

To me it looks like an issue with the service, but I am not sure.

Steps to reproduce

package slow

import (
	"fmt"
	"time"

	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds"
	"github.com/aws/aws-sdk-go/aws/ec2metadata"
	"github.com/aws/aws-sdk-go/aws/session"
)

func getCredentialsFromRole() (*credentials.Credentials, error) {
	roleProvider := &ec2rolecreds.EC2RoleProvider{
		Client: ec2metadata.New(session.New()),
	}
	creds := credentials.NewCredentials(roleProvider)

	start := time.Now().UTC()
	if _, err := creds.Get(); err != nil { // this takes 20 seconds
		return nil, err
	}
	fmt.Printf("getting credentails from role took %s\n", time.Now().UTC().Sub(start))

	return creds, nil
}

The text was updated successfully, but these errors were encountered:

diehlaws · 2019-11-25T21:19:46Z

Thanks for reaching out to us about this @alexd765. The behavior you describe sounds like it is taking a long time for the EC2 metadata service on your instance(s) to respond to a request for instance profile credentials. Do you see this behavior only on specific instance types/sizes? (If so, which ones?) Or does this occur consistently regardless of the instance type/size being used? Do you see a similar delay when attempting to reach the instance metadata service in a manner that does not involve the SDK (e.g. time curl http://169.254.169.254/latest/)?

alexd765 · 2019-11-26T17:30:57Z

I see it on t2.micro, t2.small, c5.4xlarge.

We are in eu-west-1.

curl http://169.254.169.254/latest/ is instant on the affected machines.

Interestingly it is only getting the credentials from role that is slow. Getting credentials from environment vars is also instant.

nlundbo · 2019-11-26T19:41:47Z

We also saw similar behavior, we noticed issues on the operations

S3 -> ListBuckets
ec2MetadataService -> Region()
both took 20+ seconds to complete. We noticed that going back to the previous version (for us it was v1.25.27) it was quick again. We saw the issue both in 1.25.40 and 1.25.41.

The above curl is also super-fast for us (only tested on t2.small)

jasdel · 2019-11-26T23:29:09Z

Thanks for raising this issue, @alexd765, @nlundbo. Is your application running in a Kubernetes pod, Docker container, or using any IP forwarding/proxy?

If so you may be impacted by a recent security change to EC2 instances, where the Instance Metadata service will limit the number of hops a request can make. Increasing hop limit on the instance should take care of long timeout issues on EC2 for customers who use any IP forwarding/proxying.

The EC2 ModifyInstanceMetadataOptions operation can be used to update the hop limit needed for your application's use case.

See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html for more information on EC2's IMDS update.

nlundbo · 2019-11-28T10:42:03Z

Yes, my use case was code inside a docker-container. I tested changing the number hops - and it resolved the issue. I read somewhere that it should not affect deployments in ECS - I have not verified this yet.

pmalekn · 2019-12-02T16:03:49Z

I'm also seeing this issue (I believe it's the same) and I've been able to pin point v1.25.38 as the culprit release.

Before v1.25.38 I can see the following requests being made by the sdk (when running on a docker container on EC2):

DEBU[2019-12-02T15:53:48.399Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetMetadata Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nGET /latest/meta-data/iam/security-credentials/ HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.37 (go1.13.4; linux; amd64)\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T15:53:48.4Z] aws-sdk-message="[DEBUG: Response ec2metadata/GetMetadata Details:\n---[ RESPONSE ]--------------------------------------\nHTTP/1.1 200 OK\r\nConnection: close\r\nContent-Length: 38\r\nAccept-Ranges: none\r\nContent-Type: text/plain\r\nDate: Mon, 02 Dec 2019 15:53:48 GMT\r\nLast-Modified: Mon, 02 Dec 2019 14:55:48 GMT\r\nServer: EC2ws\r\n\r\n\n-----------------------------------------------------]"

while on v1.25.38 and newer:

DEBU[2019-12-02T16:00:54.872Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetToken Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nPUT /latest/api/token HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.38 (go1.13.4; linux; amd64)\r\nContent-Length: 0\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T16:00:59.873Z] aws-sdk-message="[DEBUG: Send Request ec2metadata/GetToken failed, attempt 0/3, error RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers)]"
DEBU[2019-12-02T16:00:59.914Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetToken Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nPUT /latest/api/token HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.38 (go1.13.4; linux; amd64)\r\nContent-Length: 0\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T16:01:04.914Z] aws-sdk-message="[DEBUG: Send Request ec2metadata/GetToken failed, attempt 1/3, error RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers)]"
DEBU[2019-12-02T16:01:05.009Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetToken Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nPUT /latest/api/token HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.38 (go1.13.4; linux; amd64)\r\nContent-Length: 0\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T16:01:10.009Z] aws-sdk-message="[DEBUG: Send Request ec2metadata/GetToken failed, attempt 2/3, error RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers)]"
DEBU[2019-12-02T16:01:10.191Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetToken Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nPUT /latest/api/token HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.38 (go1.13.4; linux; amd64)\r\nContent-Length: 0\r\nX-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T16:01:15.191Z] aws-sdk-message="[DEBUG: Send Request ec2metadata/GetToken failed, attempt 3/3, error RequestError: send request failed\ncaused by: Put http://169.254.169.254/latest/api/token: net/http: request canceled (Client.Timeout exceeded while awaiting headers)]"
// Fallback to "/latest/meta-data/iam/security-credentials/"
DEBU[2019-12-02T16:01:15.191Z] aws-sdk-message="[DEBUG: Request ec2metadata/GetMetadata Details:\n---[ REQUEST POST-SIGN ]-----------------------------\nGET /latest/meta-data/iam/security-credentials/ HTTP/1.1\r\nHost: 169.254.169.254\r\nUser-Agent: aws-sdk-go/1.25.38 (go1.13.4; linux; amd64)\r\nAccept-Encoding: gzip\r\n\r\n\n-----------------------------------------------------]"
DEBU[2019-12-02T16:01:15.192Z] aws-sdk-message="[DEBUG: Response ec2metadata/GetMetadata Details:\n---[ RESPONSE ]--------------------------------------\nHTTP/1.1 200 OK\r\nConnection: close\r\nContent-Length: 38\r\nAccept-Ranges: none\r\nContent-Type: text/plain\r\nDate: Mon, 02 Dec 2019 16:01:15 GMT\r\nLast-Modified: Mon, 02 Dec 2019 15:55:34 GMT\r\nServer: EC2ws\r\n\r\n\n-----------------------------------------------------]"

I run this app without specifying AWS env keys nor with a shared profile.

tyrken · 2019-12-02T20:33:56Z

@pmalekn - They introduced the new EC2 metadata feature in v1.25.38 - and a backwards compatibility issue by setting the default hop limit to 1 - which means the replies get dropped while transiting the docker bridge network.

You have to increase the hop limit yourself or it just hit lengthy timeouts & retries before eventual fallback to the old method

pmalekn · 2019-12-03T09:19:17Z

@tyrken Ok... so you're probably referring to 7814a7f#diff-68798fbeff74277f6e5c5ab6f1cf92c6R70454-R70460

Adding a snippet since the file is too big to show when linked service/ec2/api.go

// The metadata options for the instance.
type InstanceMetadataOptionsRequest struct {
	_ struct{} `type:"structure"`

	// This parameter enables or disables the HTTP metadata endpoint on your instances.
	// If the parameter is not specified, the default state is enabled.
	//
	// If you specify a value of disabled, you will not be able to access your instance
	// metadata.
	HttpEndpoint *string `type:"string" enum:"InstanceMetadataEndpointState"`

	// The desired HTTP PUT response hop limit for instance metadata requests. The
	// larger the number, the further instance metadata requests can travel.
	//
	// Default: 1
	//
	// Possible values: Integers from 1 to 64
	HttpPutResponseHopLimit *int64 `type:"integer"`

It then makes sense that it cannot reach the endpoint if I'm calling it from inside a docker container (because there's one more hop through docker bridge right?) or is my reasoning flawed?

How can I change this "globally" then? Or do I have to do it per request basis (that doesn't sound feasible)

tyrken · 2019-12-03T09:43:25Z

No global option I know of, though hoping AWS might see sense & implement something like #2980

In the mean time you can only stay on old version of aws-sdk or update all EC2 instances to increase the hop limit, e.g. with:

aws ec2 modify-instance-metadata-options --instance-id i-34215432543254235 --http-endpoint enabled --http-put-response-hop-limit 2

alexd765 · 2019-12-03T15:57:32Z

Our problem is the same as described above.

aws ec2 modify-instance-metadata-options --instance-id i-34215432543254235 --http-endpoint enabled --http-put-response-hop-limit 2
is a temporary solution.

I really hope their will be a better solution in the future.
Running Go with this sdk in a docker container on an EC2 instance is one of the most common use cases of this sdk that I can think of. This really should just work out of the box.

…etadata client (#3066) The PR address the issues related to EC2Metadata client having long timeouts in case of failure in obtaining EC2Metadata token, while making a request to IMDS. The PR reduces the timeout to 1 sec, and number of max retries to 2 for the EC2Metadata client. This would help reduce the long timeouts faced by the customers. Fixes #2972

=== ### Service Client Updates * `service/AWSMigrationHub`: Updates service API, documentation, and paginators * `service/codebuild`: Updates service API and documentation * Add encryption key override to StartBuild API in AWS CodeBuild. * `service/xray`: Updates service documentation * Documentation updates for xray ### SDK Enhancements * `aws`: Add configuration option enable the SDK to unmarshal API response header maps to normalized lower case map keys. ([#3033](#3033)) * Setting `aws.Config.LowerCaseHeaderMaps` to `true` will result in S3's X-Amz-Meta prefixed header to be unmarshaled to lower case Metadata member's map keys. ### SDK Bugs * `aws/ec2metadata` : Reduces request timeout for EC2Metadata client along with maximum number of retries ([#3066](#3066)) * Reduces latency while fetching response from EC2Metadata client running in a container to around 3 seconds * Fixes [#2972](#2972)

Release v1.27.2 (2020-01-07) === ### Service Client Updates * `service/AWSMigrationHub`: Updates service API, documentation, and paginators * `service/codebuild`: Updates service API and documentation * Add encryption key override to StartBuild API in AWS CodeBuild. * `service/xray`: Updates service documentation * Documentation updates for xray ### SDK Enhancements * `aws`: Add configuration option enable the SDK to unmarshal API response header maps to normalized lower case map keys. ([#3033](#3033)) * Setting `aws.Config.LowerCaseHeaderMaps` to `true` will result in S3's X-Amz-Meta prefixed header to be unmarshaled to lower case Metadata member's map keys. ### SDK Bugs * `aws/ec2metadata` : Reduces request timeout for EC2Metadata client along with maximum number of retries ([#3066](#3066)) * Reduces latency while fetching response from EC2Metadata client running in a container to around 3 seconds * Fixes [#2972](#2972)

caleb15 · 2020-12-18T00:01:57Z

@alexd765 's solution worked for me - thanks Alex!

You can implement his solution in terraform like so:

resource "aws_instance" "your_ec2_instance_name" {
  params_go_here                     = var.blablabla
  
  metadata_options {
    # So docker can access ec2 metadata
    # see https://github.com/aws/aws-sdk-go/issues/2972
    http_put_response_hop_limit = 2
  }
}

Also note that it's not just go that runs into this problem - even just using curl as described in https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html results in a long delay.

julvo · 2021-01-11T14:48:51Z

Ran into the same problem using the AWS go SDK in an Elastic Beanstalk (EB) Docker environment, but didn't see a way to set http_put_response_hop_limit in EB without manipulating the EC2 instance directly.

Fixed (or rather sidestepped) it by setting AWS user credentials and default region as environment variables - hope it helps others in the same situation.

xubofei1983 · 2021-01-19T19:56:23Z

We get same problem and walk-around works.

Previously we were re-using our shared HTTP client, which has a rather high timeout (120 seconds) that causes the HTTP client to wait around for a long time. This is generally intentional (since it includes the time spent downloading a request body), but is a bad idea when running into EC2's IDMSv2 service that has a network-hop based limit. If that hop limit is exceeded, the requests just go to nowhere, causing the client to wait for a multiple of 120 seconds (~10 minutes were observed). This instead uses a special client for the EC2 instance metadata service that has a much lower timeout (1 second, like in the AWS SDK itself), to avoid the problem. See also aws/aws-sdk-go#2972

dulithag · 2021-03-15T03:51:50Z

Hi guys I am wondering is this same issue exist for go-sdk-v2 ?

caleb15 · 2021-08-23T16:34:57Z

Can we remove the stale lifecycle on this one please @skmcgrail ? I'm presuming it's still an issue and the reason people aren't commenting on it is because of the workaround. A lot of people have ran into this problem and it would be good to have explicit confirmation that it is fixed when it gets fixed.

diehlaws self-assigned this Nov 25, 2019

diehlaws added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 25, 2019

mullermp mentioned this issue Nov 26, 2019

aws-sdk-core > 3.78.0 slows down the fetching of credentials aws/aws-sdk-ruby#2177

Closed

This was referenced Nov 27, 2019

Provide an environment variable to disable IMDSv2 path #2980

Closed

Bump aws-sdk-go hashicorp/vault#7458

Merged

tyrken mentioned this issue Nov 27, 2019

I am now getting throttled on AWS api calls during the state refresh hashicorp/terraform-provider-aws#10973

Closed

shrinandj mentioned this issue Dec 3, 2019

Upgrade manager fails to actually upgrade nodes because of AWS token issues keikoproj/upgrade-manager#30

Closed

eytan-avisror mentioned this issue Dec 5, 2019

Upgrade aws-sdk-go for IRSA keikoproj/instance-manager#67

Merged

joelittlejohn mentioned this issue Dec 9, 2019

InstanceProfileCredentialsProvider is slow to get credentials since 1.11.678 aws/aws-sdk-java#2171

Closed

aaron42net mentioned this issue Dec 10, 2019

AWS sdk session init latency #2989

Closed

lifeofguenter mentioned this issue Dec 10, 2019

Regression: massive slowdown on planning since 2.39 hashicorp/terraform-provider-aws#11203

Closed

This was referenced Dec 13, 2019

fix(s3): aws-sdk version to avoid slow S3 calls bug upstream armory/go-yaml-tools#20

Merged

fix(validations/s3): Rollback aws-sdk, known bug about slow s3 calls armory/spinnaker-operator#77

Merged

This was referenced Jan 2, 2020

aws/EC2Metadata: Reduces timeout and number of max retries for EC2Metadata client #3028

Closed

aws/ec2metadata: Reduces timeout and number of max retries for EC2Metadata client #3066

Merged

skotambkar closed this as completed in #3066 Jan 7, 2020

aws-sdk-go-automation mentioned this issue Jan 7, 2020

Release v1.27.2 #3068

Merged

caleb15 mentioned this issue Dec 18, 2020

Add warning message awsdocs/amazon-ec2-user-guide#148

Merged

lfittl mentioned this issue Mar 9, 2021

Reduce timeout for accessing EC2 instance metadata service pganalyze/collector#159

Merged

gokuatkai mentioned this issue Mar 13, 2021

helm dependency update extremely slow when authenticated with EC2 role hypnoglow/helm-s3#132

Open

skmcgrail added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 19, 2021

github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Mar 20, 2021

github-actions bot closed this as completed Mar 20, 2021

kirstenkissmeyer mentioned this issue Apr 14, 2021

Get Instance Meta Data Call Has Become Very Slow After Moving from IMDBv1 to IMDBv2 on RHEL/SELinux instance #3864

Closed

sreis mentioned this issue Sep 9, 2021

fix: increase hop limit to access AWS instance metadata endpoint opstrace/opstrace#1382

Merged

nik-suri mentioned this issue Dec 25, 2021

Massive Event Loop Lag caused by VALIDATE_CREDENTIALS and readFileSync aws/aws-sdk-js#3995

Closed

4 tasks

pyaesone17 mentioned this issue Feb 17, 2022

Add capability to set timeout for ec2metadata's http client #4280

Closed

This was referenced Mar 1, 2022

IMDSv2 slow response minio/minio-go#1625

Closed

Use 1s timeout for fetching imdsv2 token minio/minio-go#1626

Merged

midnight-wonderer mentioned this issue Mar 9, 2022

Lightsail credentials fetching from IMDS is slow aws/aws-sdk-ruby#2678

Closed

3 tasks

catalinw mentioned this issue May 26, 2022

IAM Policy attached to EC2 not working when ran from inside Docker container awslabs/aws-sdk-rust#540

Closed

sflanker mentioned this issue Jun 1, 2022

error fetching chunks & failed to get s3 object grafana/loki#5221

Closed

mohamed-haidara-cko mentioned this issue Jun 2, 2022

Provider is slow when running from EC2 with other AWS providers phillbaker/terraform-provider-elasticsearch#281

Closed

joe-elliott mentioned this issue Sep 9, 2022

Tempo in ec2 machine inside a container blocks for 4 minutes on startup grafana/tempo#1727

Open

patrickdk77 mentioned this issue Feb 14, 2023

Pages open slowly with EC2 Instance Role patrickdk77/aws-s3-proxy#4

Closed

markdboyd mentioned this issue Sep 19, 2024

add meta-data-v2 vm_extension for worker VMs cloud-gov/deploy-concourse#235

Merged

iwysiu mentioned this issue Sep 30, 2024

Slowness to retrieve data in dashboard ( POST calls fail repeatedly ) grafana/timestream-datasource#305

Closed

jiefenghuang mentioned this issue Oct 12, 2024

JuiceFS connection to s3 delay due to missing IMDSv2 in EKS cluster juicedata/juicefs#5222

Closed

triplepoint mentioned this issue Dec 31, 2024

route53: aws-sdk-go-v2 broke IAM instance role go-acme/lego#2033

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get credentials from role has become slow #2972

Get credentials from role has become slow #2972

alexd765 commented Nov 25, 2019

diehlaws commented Nov 25, 2019

alexd765 commented Nov 26, 2019 •

edited

Loading

nlundbo commented Nov 26, 2019

jasdel commented Nov 26, 2019

nlundbo commented Nov 28, 2019

pmalekn commented Dec 2, 2019

tyrken commented Dec 2, 2019

pmalekn commented Dec 3, 2019 •

edited

Loading

tyrken commented Dec 3, 2019

alexd765 commented Dec 3, 2019

caleb15 commented Dec 18, 2020

julvo commented Jan 11, 2021 •

edited

Loading

xubofei1983 commented Jan 19, 2021

dulithag commented Mar 15, 2021

caleb15 commented Aug 23, 2021 •

edited

Loading

Get credentials from role has become slow #2972

Get credentials from role has become slow #2972

Comments

alexd765 commented Nov 25, 2019

Version of AWS SDK for Go?

Version of Go (go version)?

What issue did you see?

Steps to reproduce

diehlaws commented Nov 25, 2019

alexd765 commented Nov 26, 2019 • edited Loading

nlundbo commented Nov 26, 2019

jasdel commented Nov 26, 2019

nlundbo commented Nov 28, 2019

pmalekn commented Dec 2, 2019

tyrken commented Dec 2, 2019

pmalekn commented Dec 3, 2019 • edited Loading

tyrken commented Dec 3, 2019

alexd765 commented Dec 3, 2019

caleb15 commented Dec 18, 2020

julvo commented Jan 11, 2021 • edited Loading

xubofei1983 commented Jan 19, 2021

dulithag commented Mar 15, 2021

caleb15 commented Aug 23, 2021 • edited Loading

Version of Go (`go version`)?

alexd765 commented Nov 26, 2019 •

edited

Loading

pmalekn commented Dec 3, 2019 •

edited

Loading

julvo commented Jan 11, 2021 •

edited

Loading

caleb15 commented Aug 23, 2021 •

edited

Loading