-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermediate nodes that fails to start #990
Comments
Are you defining your own bootstrap command? It appears that So Do you see any stderr from this line? https://github.com/awslabs/amazon-eks-ami/blob/master/files/pull-sandbox-image.sh#L6 |
I did not modify the bootstrap command. Do you see any stderr from this line? - I already deleted the faulty node so I can't check it now. |
My guess is that If you're creating a large number of nodes in a short period of time, you might be hitting a rate limit. I think the relevant Service Quota would be @suket22 have you seen something like this before? |
Looks like a bug in any case, this script has no handling for |
@cartermckinnon Any thoughts on baking the pause image directly into the AMI? This seems like the right thing to do, but could somewhat challenging given we don't know which container runtime will be used until the bootstrap script runs. I may take a closer look into what it will take. |
What happened:
Every couple of days, we have a node that isn't able to boot and is stuck in NotReady.
Here are the logs:
When I'm manually start the sandbox-image service it works:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
us-east-1
c4.8xlarge
(But not only)aws eks describe-cluster --name <name> --query cluster.platformVersion
):eks.7
aws eks describe-cluster --name <name> --query cluster.version
):1.20
uname -a
):Linux ip-10-208-73-71.ec2.internalec2ssa.info 5.4.204-113.362.amzn2.x86_64 #1 SMP Wed Jul 13 21:34:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/eks/release
on a node):The text was updated successfully, but these errors were encountered: