-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchdog failed to retrieve AWS security credentials after 1 hour #83
Comments
Thanks for the feedback. We’ll have someone take a look |
Hey @xian13 , Thanks for the feedback. Can you provide the following info?
Thanks. |
Hi @Cappuccinuo , here are the informations needed
|
Thanks for all the info. I opened a issue on ecs agent side for further investigation/fix. Currently the root cause is that The EFS volume is not umounted after the task & container is killed, and due to the iam role credentials is fetched in the container ENV itself, after the task is killed, the credentials link is invalid which cause the credentials retrieval failed. The expected behaviour should be after the task / container is killed, the volume resource should be cleaned up. The temporary workaround now will be manually umount the volumes on host, so that the EFS watchdog can cleanup the unused volumes stunnels. |
I'm using EFS as volume at ECS Task Definition https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-efsvolumeconfiguration.html
The task definition is deployed at EC2 Instances
My task definition setup
At first everything is fine, but after an hour, there will be many error printed in
/var/log/amazon/efs/mount-watchdog.log
The already mounted EFS can still be accessed both read and write
If new container created, I still not sure whether the new container can access EFS with no error or not, but I think it should be OK because if after an hour new container can't access EFS, there will be already many error on my production apps
The current problem is high cpu load because I think it keeps printing to the log
So all my t3 cpu credits being used to print this log
The text was updated successfully, but these errors were encountered: