-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodegroup with custom AMI doesn't join the cluster #622
Comments
When I use the first config file above with
Which means that we certainly will create public and private subnets. And to be sure, you can unfold the CloudFormation template with
|
So there are certainly 3x public and 3x private subnets, along with Internet and NAT gateways. Could this be related to #605 and/or somehow specific to your AWS account (if you have direct connect or something of that kind?). |
Hey, thanks for the detailed response! I looked at #605 and I don't know if it was related specifically, since eksctl created the entire VPC and routes those few times. I have it working now, but I explicitly crafted my subnets per vpc, giving and set up the nat and igw beforehand, then launched the same basic config file with no private node workers, and it came up fine. I honestly don't know :\ |
So I've managed to reproduce it, and it turns out I'll re-test with default AMI before I can confirm if this is a bug or not. |
I can confirm that removing |
To be clear, if you must use Ubuntu, you'd have to use this: apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig
metadata:
name: staging
region: us-east-1
version: 1.10
tags:
environment: staging
creator: eksctl
vpc:
cidr: "172.20.0.0/16"
nodeGroups:
- name: ng-1-workers
labels:
role: workers
nodegroup-type: backend-api-workers
iam:
withAddonPolicies:
autoScaler: true
instanceType: t3.medium
desiredCapacity: 2
privateNetworking: true
amiFamily: Ubuntu1804
# ami: ami-06fd8200ac0eb656d # (optional)
allowSSH: true
sshPublicKeyPath: /Users/petetaylor/.ssh/aws_stag_vpc.pub
availabilityZones: ["us-east-1a", "us-east-1b"] |
What I think we discovered was that the AMI was built for a different K8s version and so would not allow itself to be deployed on a newer version, breaking the whole install. |
I also faced same issue and I was using eksctl with parameters in command line. I have added
|
I eventually found the issues by ssh-ing into the worker nodes and examining the kubelet logs. My particular issue was that some docker images present were corrupted so I had to remove all docker images and restart docker and the kubelet and things worked. I have no reason to believe this is a common scenario, so the underlying method is to just ssh into a worker node and look for logs. |
What happened?
To be fair, there are other open issues surrounding this same kind of thing, but they did not seem to be identical to what was happening for me. I spoke with @errordeveloper and we decided it would be good to open this ticket.
After applying the config below, I received the following error:
This is the config I was using:
What you expected to happen?
From what I understand of the documentation, this should have produced a viable cluster. But it did not, the workers could not attach to the EKS instance. After discussion with @errordeveloper on slack, I tried the same config with privateNetworking set to false. That did not work either, unfortunately, and the worker nodes did not come online.
What is interesting, @errordeveloper suggested that I try to run it and create a few node groups in a public zone explicitly. I did that, and all four nodes now show up and are available. That config is here:
At the end of the day, for what we are setting up right now, I don't actually need or want public subnet nodes. We have a site-to-site VPN with AWS and we are mostly building in-house tools against the clusters, our public facing site runs on other legacy architecture. So it would be better for me if that was not a required dependency, but if it turns out to be that's alright.
How to reproduce it?
Run
eksctl create cluster -f cluster.yaml
with the contents of the first yaml file outlined above, the one that lists only private subnet worker nodes.Anything else we need to know?
My AWS credentials are fine, I have admin access and the cloudformation, eks, ec2, vpc, igw, nat gw, etc elements are all able to be created without error.
Versions
Please paste in the output of these commands:
Also include your version of
heptio-authenticator-aws
I'm using homebrew:
/usr/local/Cellar/aws-iam-authenticator/0.3.0/bin/aws-iam-authenticator
So 0.3.0 of that variant, anyway.
Logs
No specific logs. When it works with using the second config, there are no meaningful logs, when it doesn't, the error is as above.
The text was updated successfully, but these errors were encountered: