-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when retrieving credentials from iam-role: Credential refresh failed, response did not contain: access_key, secret_key, token, expiry_time #1617
Comments
It looks like you're sourcing credentials from the EC2 Instance Metadata and the request to fetch them failed. By default we don't retry those requests, but you can add retries with |
@JordonPhillips Thank you for the response, so would this be a matter of adding those environment variables into the container this is happening in? |
In my case I get the same error but according to kube2iam logs the request for creds does not fail. |
@JordonPhillips what if we are sourcing creds from a k8s pod which runs on a k8s worker node (ec2 instance) , so not directly on EC2 instance. do we set AWS_METADATA_SERVICE_NUM_ATTEMPTS as an env var to a pod? is it still legit then? thanks! |
I'm also using kube2iam to have a pod assume an IAM role and seeing this error sporadically. It sometimes happens at the start of the container, but we've also seen it happen after the containers been running for a while. Any suggestions on workarounds? We've set |
@shshe what does botocore debug log level say ? also, do you use celery? |
Hi @TattiQ , unfortunately, we didn't have DEBUG level turned on. I think our issue lies in kube2iam introducing latency when querying the EC2 metadata URI. Here's the related issue: We're currently trying the workaround with setting Edit: Yes, we do use celery and saw this in our celery app. But, we've also seen this issue crop up in a Kubernetes job that used multiprocessing + boto. |
Hey guys, did you figure out this issue by any chance? |
Following up on this issue. Solution provided by @JordonPhillips here should fix the issue. Is anyone still getting the error ? if yes, please reopen a new issue. I would be happy to help. |
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further. |
We're having this issue specifically with K8S as well. Did setting the Environment variables work? |
Yes, setting the environment variables (specifically the retry attempts) seems to have mostly resolved the issue for us. We are also in an EKS K8s environment. |
Hi, I was facing this issue running python in a pod in an EKS cluster, and it seems at first glance the retries/timeout solution worked. Did anyone figure out a reason why these requests fail? I've seen pods restart hundreds of times because of this and I'm curious if there is something in the EKS setup that can be used to mitigate. |
Bump on this, I am also seeing this issue in kube2iam/EKS |
Yep, still seeing this one year later. |
Just noticed this too. Hundreds of restart in one night when it never happened in the last 6 months. No configuration changes or anything. |
This is also happening for us. (python ---> kube2iam ---> AWS) |
Seems to be the error return while handling this issue boto/boto3#1751. The workaround when we hit this issue was to re-attach the instance metadata. |
I ran into this issue while working in a Jupyter notebook on EC2 instance. When it first started happening this month, all I had to do is rerun the code and it would work again on the second or third try. However, additional attempts stopped working for me this week. After struggling with a few different options which didn't work in my case, I finally decided to upgrade my Python (from 3.6 to 3.7) and dask (from 2021.11.01 to 2022.2.0), and that fixed the issue for me completely. |
We are seeing a strange issue relating to
boto3
andbotocore
. The following error is being thrown sporadically when we try to read from S3 or utilize an SQS client.Error when retrieving credentials from iam-role: Credential refresh failed, response did not contain: access_key, secret_key, token, expiry_time
It appears that the credentials are not correctly getting refreshed via the assumed IAM role. This a Python application running inside of a Docker container within EKS. An example piece of code is below.
Does anybody have any ideas why this is happening and whether or not this is a known issue with boto?
The text was updated successfully, but these errors were encountered: