-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apparently random NoCredentialsError after running for a while #1006
Comments
there's definitely something funky with creds based on all the recent issues logged. we need a reliable test case where we can compare debug botocore and aiobotocore logs |
I wish I could provide it but I haven't managed to reproduce this locally yet, just in production after running for a while. |
I noticed similar issue happening when reading/writing to S3 with process count > 5 for versions 2.4.2 |
any interesting info with debug level logging? |
To add some additional context on this that might help untangle the issue:
Would it be a better approach to have a long-lived session instantiated in the class instead of creating a new one every time |
long lived session/client always preferred. botocore should take care of refreshing credentials. |
If that's the case we should probably document it, specially if it can
cause bugs like this one.
Missatge de Alexander Mohr ***@***.***> del dia dt., 9 de
maig 2023 a les 18:27:
… long lived session/client always preferred. botocore should take care of
refreshing credentials.
—
Reply to this email directly, view it on GitHub
<#1006 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN625SC6EUTWS2HZKKTI43XFJWANANCNFSM6AAAAAAXCQ3UUI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
can you try after release with #1022 available? |
could be related to #1025, I'd try once that release is available (later today) |
actually the important part isn't the session, it's the client, you should keep your client for as long as possible. A client is tied to a connection pool, so it's heavy to keep re-creating them |
to debug this I really need a reproducible test case. I have my own AWS account so if you can create a full encapsulated test case I can try to debug this otherwise there just isn't enough for me to go on here and I'll have to close it. Another option is to create a test case using moto |
The problem is that given that the client is an async context manager there's not nice/elegant way to have a long lived client. You'd need to enter manually and create some teardown hook to exit. |
sure there is, we do this all the time: class SQSService:
def __init__(self, sqs_region: str, sqs_url: str):
self.default_source = "unknown"
self.sqs_region = sqs_region
self.sqs_url = sqs_url
self._exit_stack = contextlib.AsyncExitStack()
async def __aenter__(self):
self._client = await self._exit_stack.enter_async_context(session.create_client("sqs", region_name=self.sqs_region)
return self
async def __aexit__(self, *args):
await self._exit_stack.__aexit__(*args) |
This is the kind of pattern I'd love to see documented. |
I think we assumed it was common knowledge but open to PRs / issues to add to docs |
I'd like to be able to get to the bottom of what's causing this issue as well though. Unfortunately we'll need some sort of way to reproduce |
Do we have a solution for this yet? I'm still experiencing this. However, I thought the issue was not random, and occurred with almost every call. But it could be because I only looked at later logs. I wonder if explicit passing of access_key and secret_key would resolve this? |
we need a way to repro or a detailed analysis from someone who can repro |
This issue has been marked as stale because it has been inactive for more than 60 days. Please update this pull request or it will be automatically closed in 7 days. |
what's funny we're hitting something like this as well on aws. I'm guessing periodically the AWS call fails for some reason. |
Describe the bug
We have an aiohttp server that sends SQS messages as result of certain actions. After running for a while we'll get
Our code that triggers the issue in production, where we use IAM roles:
We've tried multiple versions including 2.0.0 and 2.5.0
After many many tests trying to find a way to reproduce the issue locally, we've managed to mitigate it using backoff. When we do, this is what we get:
This leads me to believe there's a run condition somewhere that only triggers after a while running where you might end up with missing credentials temporarily.
Checklist
pip check
passes without errorspip freeze
resultspip freeze results
Environment:
Additional context
Happy to provide any further context to help resolve this.
The text was updated successfully, but these errors were encountered: