Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodegroup with custom AMI doesn't join the cluster #622

Closed
jcoletaylor opened this issue Mar 11, 2019 · 10 comments
Closed

nodegroup with custom AMI doesn't join the cluster #622

jcoletaylor opened this issue Mar 11, 2019 · 10 comments

Comments

@jcoletaylor
Copy link

What happened?
To be fair, there are other open issues surrounding this same kind of thing, but they did not seem to be identical to what was happening for me. I spoke with @errordeveloper and we decided it would be good to open this ticket.

After applying the config below, I received the following error:

[ℹ]  waiting for at least 2 node(s) to become ready in "ng-1-workers"
[✖]  timed out (after 20m0s) waitiing for at least 2 nodes to join the cluster and become ready in "ng-1-workers"

This is the config I was using:

apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig

metadata:
  name: staging
  region: us-east-1
  tags: 
    environment: staging
    creator: eksctl

vpc:
  cidr: "172.20.0.0/16"

nodeGroups:
  - name: ng-1-workers
    labels:
      role: workers
      nodegroup-type: backend-api-workers
    iam:
      withAddonPolicies:
        autoScaler: true
    instanceType: t3.medium
    desiredCapacity: 2
    privateNetworking: true
    ami: ami-06fd8200ac0eb656d
    allowSSH: true
    sshPublicKeyPath: /Users/petetaylor/.ssh/aws_stag_vpc.pub

availabilityZones: ["us-east-1a", "us-east-1b"]

What you expected to happen?
From what I understand of the documentation, this should have produced a viable cluster. But it did not, the workers could not attach to the EKS instance. After discussion with @errordeveloper on slack, I tried the same config with privateNetworking set to false. That did not work either, unfortunately, and the worker nodes did not come online.

What is interesting, @errordeveloper suggested that I try to run it and create a few node groups in a public zone explicitly. I did that, and all four nodes now show up and are available. That config is here:

apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig

metadata:
  name: staging
  region: us-east-1
  tags: 
    environment: staging
    creator: eksctl

vpc:
  cidr: "172.20.0.0/16"

nodeGroups:
  - name: ng-1-public-workers 
    labels:
      role: workers
      nodegroup-type: frontend-api-workers
    iam:
      withAddonPolicies:
        autoScaler: true
    instanceType: t2.small
    desiredCapacity: 2
    privateNetworking: false
    allowSSH: true
    sshPublicKeyPath: /Users/petetaylor/.ssh/aws_stag_vpc.pub
  - name: ng-2-private-workers
    labels:
      role: workers
      nodegroup-type: backend-api-workers
    iam:
      withAddonPolicies:
        autoScaler: true
    instanceType: t3.medium
    desiredCapacity: 2
    privateNetworking: true
    allowSSH: true
    sshPublicKeyPath: /Users/petetaylor/.ssh/aws_stag_vpc.pub

availabilityZones: ["us-east-1a", "us-east-1b"]

At the end of the day, for what we are setting up right now, I don't actually need or want public subnet nodes. We have a site-to-site VPN with AWS and we are mostly building in-house tools against the clusters, our public facing site runs on other legacy architecture. So it would be better for me if that was not a required dependency, but if it turns out to be that's alright.

How to reproduce it?
Run eksctl create cluster -f cluster.yaml with the contents of the first yaml file outlined above, the one that lists only private subnet worker nodes.

Anything else we need to know?
My AWS credentials are fine, I have admin access and the cloudformation, eks, ec2, vpc, igw, nat gw, etc elements are all able to be created without error.

Versions
Please paste in the output of these commands:

#eksctl version
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.23"}
#uname -a 
Darwin fractmac 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64
#kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:34:27Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Also include your version of heptio-authenticator-aws
I'm using homebrew:
/usr/local/Cellar/aws-iam-authenticator/0.3.0/bin/aws-iam-authenticator
So 0.3.0 of that variant, anyway.

Logs
No specific logs. When it works with using the second config, there are no meaningful logs, when it doesn't, the error is as above.

@errordeveloper
Copy link
Contributor

errordeveloper commented Mar 14, 2019

When I use the first config file above with eksctl create cluster -f issue-622.yaml -v=4, I see this:

2019-03-14T17:11:40Z [ℹ]  subnets for us-east-1a - public:172.20.0.0/19 private:172.20.64.0/19
2019-03-14T17:11:40Z [ℹ]  subnets for us-east-1b - public:172.20.32.0/19 private:172.20.96.0/19
2019-03-14T17:11:41Z [ℹ]  nodegroup "ng-1-workers" will use "ami-06fd8200ac0eb656d" [AmazonLinux2/1.11]
2019-03-14T17:11:41Z [ℹ]  importing SSH public key "/Users/ilya/.ssh/id_rsa.pub" as "eksctl-staging-nodegroup-ng-1-workers-f6:02:52:83:8c:62:fd:86:e0:98:24:44:f9:8d:02:0c"
2019-03-14T17:11:42Z [ℹ]  creating EKS cluster "staging" in "us-east-1" region
2019-03-14T17:11:42Z [▶]  cfg.json = \
{
    "kind": "ClusterConfig",
    "apiVersion": "eksctl.io/v1alpha4",
    "metadata": {
        "name": "staging",
        "region": "us-east-1",
        "version": "1.11",
        "tags": {
            "creator": "eksctl",
            "environment": "staging"
        }
    },
    "iam": {},
    "vpc": {
        "cidr": "172.20.0.0/16",
        "subnets": {
            "private": {
                "us-east-1a": {
                    "cidr": "172.20.64.0/19"
                },
                "us-east-1b": {
                    "cidr": "172.20.96.0/19"
                }
            },
            "public": {
                "us-east-1a": {
                    "cidr": "172.20.0.0/19"
                },
                "us-east-1b": {
                    "cidr": "172.20.32.0/19"
                }
            }
        }
    },
    "nodeGroups": [
        {
            "name": "ng-1-workers",
            "ami": "ami-06fd8200ac0eb656d",
            "amiFamily": "AmazonLinux2",
            "instanceType": "t3.medium",
            "privateNetworking": true,
            "securityGroups": {
                "withShared": true,
                "withLocal": true
            },
            "desiredCapacity": 2,
            "volumeSize": 0,
            "volumeType": "",
            "labels": {
                "alpha.eksctl.io/cluster-name": "staging",
                "alpha.eksctl.io/nodegroup-name": "ng-1-workers",
                "nodegroup-type": "backend-api-workers",
                "role": "workers"
            },
            "allowSSH": true,
            "sshPublicKeyPath": "/Users/ilya/.ssh/id_rsa.pub",
            "SSHPublicKey": "...",
            "sshPublicKeyName": "eksctl-staging-nodegroup-ng-1-workers-f6:02:52:83:8c:62:fd:86:e0:98:24:44:f9:8d:02:0c",
            "iam": {
                "withAddonPolicies": {
                    "imageBuilder": false,
                    "autoScaler": true,
                    "externalDNS": false
                }
            }
        }
    ],
    "availabilityZones": [
        "us-east-1a",
        "us-east-1b"
    ]
}
2019-03-14T17:11:42Z [ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
2019-03-14T17:11:42Z [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --name=staging'
2019-03-14T17:11:42Z [▶]  waiting for 1 tasks to complete
2019-03-14T17:11:42Z [▶]  task 0 started
2019-03-14T17:11:42Z [ℹ]  creating cluster stack "eksctl-staging-cluster"
2019-03-14T17:11:42Z [▶]  CreateStackInput = {
  Capabilities: ["CAPABILITY_IAM"],
  StackName: "eksctl-staging-cluster",
  Tags: [{
      Key: "eksctl.cluster.k8s.io/v1alpha1/cluster-name",
      Value: "staging"
    },{
      Key: "creator",
      Value: "eksctl"
    },{
      Key: "environment",
      Value: "staging"
    }],
  TemplateBody: "{\"AWSTemplateFormatVersion\":\"2010-09-09\",\"Description\":\"EKS cluster (dedicated VPC: true, dedicated IAM: true) [created and managed by eksctl]\",\"Resources\":{\"ClusterSharedNodeSecurityGroup\":{\"Type\":\"AWS::EC2::SecurityGroup\",\"Properties\":{\"GroupDescription\":\"Communication between all nodes in the cluster\",\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/ClusterSharedNodeSecurityGroup\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"ControlPlane\":{\"Type\":\"AWS::EKS::Cluster\",\"Properties\":{\"Name\":\"staging\",\"ResourcesVpcConfig\":{\"SecurityGroupIds\":[{\"Ref\":\"ControlPlaneSecurityGroup\"}],\"SubnetIds\":[{\"Ref\":\"SubnetPublicUSEAST1A\"},{\"Ref\":\"SubnetPublicUSEAST1B\"},{\"Ref\":\"SubnetPrivateUSEAST1A\"},{\"Ref\":\"SubnetPrivateUSEAST1B\"}]},\"RoleArn\":{\"Fn::GetAtt\":\"ServiceRole.Arn\"},\"Version\":\"1.11\"}},\"ControlPlaneSecurityGroup\":{\"Type\":\"AWS::EC2::SecurityGroup\",\"Properties\":{\"GroupDescription\":\"Communication between the control plane and worker nodegroups\",\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/ControlPlaneSecurityGroup\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"IngressInterNodeGroupSG\":{\"Type\":\"AWS::EC2::SecurityGroupIngress\",\"Properties\":{\"Description\":\"Allow nodes to communicate with each other (all ports)\",\"FromPort\":0,\"GroupId\":{\"Ref\":\"ClusterSharedNodeSecurityGroup\"},\"IpProtocol\":\"-1\",\"SourceSecurityGroupId\":{\"Ref\":\"ClusterSharedNodeSecurityGroup\"},\"ToPort\":65535}},\"InternetGateway\":{\"Type\":\"AWS::EC2::InternetGateway\",\"Properties\":{\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/InternetGateway\"}}]}},\"NATGateway\":{\"Type\":\"AWS::EC2::NatGateway\",\"Properties\":{\"AllocationId\":{\"Fn::GetAtt\":\"NATIP.AllocationId\"},\"SubnetId\":{\"Ref\":\"SubnetPublicUSEAST1A\"},\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/NATGateway\"}}]}},\"NATIP\":{\"Type\":\"AWS::EC2::EIP\",\"Properties\":{\"Domain\":\"vpc\"}},\"PolicyCloudWatchMetrics\":{\"Type\":\"AWS::IAM::Policy\",\"Properties\":{\"PolicyDocument\":{\"Statement\":[{\"Action\":[\"cloudwatch:PutMetricData\"],\"Effect\":\"Allow\",\"Resource\":\"*\"}],\"Version\":\"2012-10-17\"},\"PolicyName\":{\"Fn::Sub\":\"${AWS::StackName}-PolicyCloudWatchMetrics\"},\"Roles\":[{\"Ref\":\"ServiceRole\"}]}},\"PolicyNLB\":{\"Type\":\"AWS::IAM::Policy\",\"Properties\":{\"PolicyDocument\":{\"Statement\":[{\"Action\":[\"elasticloadbalancing:*\",\"ec2:CreateSecurityGroup\",\"ec2:Describe*\"],\"Effect\":\"Allow\",\"Resource\":\"*\"}],\"Version\":\"2012-10-17\"},\"PolicyName\":{\"Fn::Sub\":\"${AWS::StackName}-PolicyNLB\"},\"Roles\":[{\"Ref\":\"ServiceRole\"}]}},\"PrivateRouteTable\":{\"Type\":\"AWS::EC2::RouteTable\",\"Properties\":{\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/PrivateRouteTable\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"PrivateSubnetRoute\":{\"Type\":\"AWS::EC2::Route\",\"Properties\":{\"DestinationCidrBlock\":\"0.0.0.0/0\",\"NatGatewayId\":{\"Ref\":\"NATGateway\"},\"RouteTableId\":{\"Ref\":\"PrivateRouteTable\"}}},\"PublicRouteTable\":{\"Type\":\"AWS::EC2::RouteTable\",\"Properties\":{\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/PublicRouteTable\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"PublicSubnetRoute\":{\"Type\":\"AWS::EC2::Route\",\"Properties\":{\"DestinationCidrBlock\":\"0.0.0.0/0\",\"GatewayId\":{\"Ref\":\"InternetGateway\"},\"RouteTableId\":{\"Ref\":\"PublicRouteTable\"}}},\"RouteTableAssociationPrivateUSEAST1A\":{\"Type\":\"AWS::EC2::SubnetRouteTableAssociation\",\"Properties\":{\"RouteTableId\":{\"Ref\":\"PrivateRouteTable\"},\"SubnetId\":{\"Ref\":\"SubnetPrivateUSEAST1A\"}}},\"RouteTableAssociationPrivateUSEAST1B\":{\"Type\":\"AWS::EC2::SubnetRouteTableAssociation\",\"Properties\":{\"RouteTableId\":{\"Ref\":\"PrivateRouteTable\"},\"SubnetId\":{\"Ref\":\"SubnetPrivateUSEAST1B\"}}},\"RouteTableAssociationPublicUSEAST1A\":{\"Type\":\"AWS::EC2::SubnetRouteTableAssociation\",\"Properties\":{\"RouteTableId\":{\"Ref\":\"PublicRouteTable\"},\"SubnetId\":{\"Ref\":\"SubnetPublicUSEAST1A\"}}},\"RouteTableAssociationPublicUSEAST1B\":{\"Type\":\"AWS::EC2::SubnetRouteTableAssociation\",\"Properties\":{\"RouteTableId\":{\"Ref\":\"PublicRouteTable\"},\"SubnetId\":{\"Ref\":\"SubnetPublicUSEAST1B\"}}},\"ServiceRole\":{\"Type\":\"AWS::IAM::Role\",\"Properties\":{\"AssumeRolePolicyDocument\":{\"Statement\":[{\"Action\":[\"sts:AssumeRole\"],\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"eks.amazonaws.com\"]}}],\"Version\":\"2012-10-17\"},\"ManagedPolicyArns\":[\"arn:aws:iam::aws:policy/AmazonEKSServicePolicy\",\"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy\"]}},\"SubnetPrivateUSEAST1A\":{\"Type\":\"AWS::EC2::Subnet\",\"Properties\":{\"AvailabilityZone\":\"us-east-1a\",\"CidrBlock\":\"172.20.64.0/19\",\"Tags\":[{\"Key\":\"kubernetes.io/role/internal-elb\",\"Value\":\"1\"},{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/SubnetPrivateUSEAST1A\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"SubnetPrivateUSEAST1B\":{\"Type\":\"AWS::EC2::Subnet\",\"Properties\":{\"AvailabilityZone\":\"us-east-1b\",\"CidrBlock\":\"172.20.96.0/19\",\"Tags\":[{\"Key\":\"kubernetes.io/role/internal-elb\",\"Value\":\"1\"},{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/SubnetPrivateUSEAST1B\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"SubnetPublicUSEAST1A\":{\"Type\":\"AWS::EC2::Subnet\",\"Properties\":{\"AvailabilityZone\":\"us-east-1a\",\"CidrBlock\":\"172.20.0.0/19\",\"Tags\":[{\"Key\":\"kubernetes.io/role/elb\",\"Value\":\"1\"},{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/SubnetPublicUSEAST1A\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"SubnetPublicUSEAST1B\":{\"Type\":\"AWS::EC2::Subnet\",\"Properties\":{\"AvailabilityZone\":\"us-east-1b\",\"CidrBlock\":\"172.20.32.0/19\",\"Tags\":[{\"Key\":\"kubernetes.io/role/elb\",\"Value\":\"1\"},{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/SubnetPublicUSEAST1B\"}}],\"VpcId\":{\"Ref\":\"VPC\"}}},\"VPC\":{\"Type\":\"AWS::EC2::VPC\",\"Properties\":{\"CidrBlock\":\"172.20.0.0/16\",\"EnableDnsHostnames\":true,\"EnableDnsSupport\":true,\"Tags\":[{\"Key\":\"Name\",\"Value\":{\"Fn::Sub\":\"${AWS::StackName}/VPC\"}}]}},\"VPCGatewayAttachment\":{\"Type\":\"AWS::EC2::VPCGatewayAttachment\",\"Properties\":{\"InternetGatewayId\":{\"Ref\":\"InternetGateway\"},\"VpcId\":{\"Ref\":\"VPC\"}}}},\"Outputs\":{\"ARN\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::ARN\"}},\"Value\":{\"Fn::GetAtt\":\"ControlPlane.Arn\"}},\"CertificateAuthorityData\":{\"Value\":{\"Fn::GetAtt\":\"ControlPlane.CertificateAuthorityData\"}},\"ClusterStackName\":{\"Value\":{\"Ref\":\"AWS::StackName\"}},\"Endpoint\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::Endpoint\"}},\"Value\":{\"Fn::GetAtt\":\"ControlPlane.Endpoint\"}},\"SecurityGroup\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::SecurityGroup\"}},\"Value\":{\"Ref\":\"ControlPlaneSecurityGroup\"}},\"ServiceRoleARN\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::ServiceRoleARN\"}},\"Value\":{\"Fn::GetAtt\":\"ServiceRole.Arn\"}},\"SharedNodeSecurityGroup\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::SharedNodeSecurityGroup\"}},\"Value\":{\"Ref\":\"ClusterSharedNodeSecurityGroup\"}},\"SubnetsPrivate\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::SubnetsPrivate\"}},\"Value\":{\"Fn::Join\":[\",\",[{\"Ref\":\"SubnetPrivateUSEAST1A\"},{\"Ref\":\"SubnetPrivateUSEAST1B\"}]]}},\"SubnetsPublic\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::SubnetsPublic\"}},\"Value\":{\"Fn::Join\":[\",\",[{\"Ref\":\"SubnetPublicUSEAST1A\"},{\"Ref\":\"SubnetPublicUSEAST1B\"}]]}},\"VPC\":{\"Export\":{\"Name\":{\"Fn::Sub\":\"${AWS::StackName}::VPC\"}},\"Value\":{\"Ref\":\"VPC\"}}}}"

Which means that we certainly will create public and private subnets.

And to be sure, you can unfold the CloudFormation template with jq, and here is a simplified version:

{
  "ClusterSharedNodeSecurityGroup": {
    "Type": "AWS::EC2::SecurityGroup",
    "Properties": {
      "GroupDescription": "Communication between all nodes in the cluster",
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/ClusterSharedNodeSecurityGroup"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "ControlPlane": {
    "Type": "AWS::EKS::Cluster",
    "Properties": {
      "Name": "staging",
      "ResourcesVpcConfig": {
        "SecurityGroupIds": [
          {
            "Ref": "ControlPlaneSecurityGroup"
          }
        ],
        "SubnetIds": [
          {
            "Ref": "SubnetPublicUSEAST1A"
          },
          {
            "Ref": "SubnetPublicUSEAST1B"
          },
          {
            "Ref": "SubnetPrivateUSEAST1A"
          },
          {
            "Ref": "SubnetPrivateUSEAST1B"
          }
        ]
      },
      "RoleArn": {
        "Fn::GetAtt": "ServiceRole.Arn"
      },
      "Version": "1.11"
    }
  },
  "ControlPlaneSecurityGroup": {
    "Type": "AWS::EC2::SecurityGroup",
    "Properties": {
      "GroupDescription": "Communication between the control plane and worker nodegroups",
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/ControlPlaneSecurityGroup"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "IngressInterNodeGroupSG": {
    "Type": "AWS::EC2::SecurityGroupIngress",
    "Properties": {
      "Description": "Allow nodes to communicate with each other (all ports)",
      "FromPort": 0,
      "GroupId": {
        "Ref": "ClusterSharedNodeSecurityGroup"
      },
      "IpProtocol": "-1",
      "SourceSecurityGroupId": {
        "Ref": "ClusterSharedNodeSecurityGroup"
      },
      "ToPort": 65535
    }
  },
  "InternetGateway": {
    "Type": "AWS::EC2::InternetGateway",
    "Properties": {
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/InternetGateway"
          }
        }
      ]
    }
  },
  "NATGateway": {
    "Type": "AWS::EC2::NatGateway",
    "Properties": {
      "AllocationId": {
        "Fn::GetAtt": "NATIP.AllocationId"
      },
      "SubnetId": {
        "Ref": "SubnetPublicUSEAST1A"
      },
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/NATGateway"
          }
        }
      ]
    }
  },
  "NATIP": {
    "Type": "AWS::EC2::EIP",
    "Properties": {
      "Domain": "vpc"
    }
  },
  "PolicyCloudWatchMetrics": {
    "Type": "AWS::IAM::Policy",
    "Properties": {
      "PolicyDocument": {
        "Statement": [
          {
            "Action": [
              "cloudwatch:PutMetricData"
            ],
            "Effect": "Allow",
            "Resource": "*"
          }
        ],
        "Version": "2012-10-17"
      },
      "PolicyName": {
        "Fn::Sub": "${AWS::StackName}-PolicyCloudWatchMetrics"
      },
      "Roles": [
        {
          "Ref": "ServiceRole"
        }
      ]
    }
  },
  "PolicyNLB": {
    "Type": "AWS::IAM::Policy",
    "Properties": {
      "PolicyDocument": {
        "Statement": [
          {
            "Action": [
              "elasticloadbalancing:*",
              "ec2:CreateSecurityGroup",
              "ec2:Describe*"
            ],
            "Effect": "Allow",
            "Resource": "*"
          }
        ],
        "Version": "2012-10-17"
      },
      "PolicyName": {
        "Fn::Sub": "${AWS::StackName}-PolicyNLB"
      },
      "Roles": [
        {
          "Ref": "ServiceRole"
        }
      ]
    }
  },
  "PrivateRouteTable": {
    "Type": "AWS::EC2::RouteTable",
    "Properties": {
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/PrivateRouteTable"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "PrivateSubnetRoute": {
    "Type": "AWS::EC2::Route",
    "Properties": {
      "DestinationCidrBlock": "0.0.0.0/0",
      "NatGatewayId": {
        "Ref": "NATGateway"
      },
      "RouteTableId": {
        "Ref": "PrivateRouteTable"
      }
    }
  },
  "PublicRouteTable": {
    "Type": "AWS::EC2::RouteTable",
    "Properties": {
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/PublicRouteTable"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "PublicSubnetRoute": {
    "Type": "AWS::EC2::Route",
    "Properties": {
      "DestinationCidrBlock": "0.0.0.0/0",
      "GatewayId": {
        "Ref": "InternetGateway"
      },
      "RouteTableId": {
        "Ref": "PublicRouteTable"
      }
    }
  },
  "RouteTableAssociationPrivateUSEAST1A": {
    "Type": "AWS::EC2::SubnetRouteTableAssociation",
    "Properties": {
      "RouteTableId": {
        "Ref": "PrivateRouteTable"
      },
      "SubnetId": {
        "Ref": "SubnetPrivateUSEAST1A"
      }
    }
  },
  "RouteTableAssociationPrivateUSEAST1B": {
    "Type": "AWS::EC2::SubnetRouteTableAssociation",
    "Properties": {
      "RouteTableId": {
        "Ref": "PrivateRouteTable"
      },
      "SubnetId": {
        "Ref": "SubnetPrivateUSEAST1B"
      }
    }
  },
  "RouteTableAssociationPublicUSEAST1A": {
    "Type": "AWS::EC2::SubnetRouteTableAssociation",
    "Properties": {
      "RouteTableId": {
        "Ref": "PublicRouteTable"
      },
      "SubnetId": {
        "Ref": "SubnetPublicUSEAST1A"
      }
    }
  },
  "RouteTableAssociationPublicUSEAST1B": {
    "Type": "AWS::EC2::SubnetRouteTableAssociation",
    "Properties": {
      "RouteTableId": {
        "Ref": "PublicRouteTable"
      },
      "SubnetId": {
        "Ref": "SubnetPublicUSEAST1B"
      }
    }
  },
  "ServiceRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
      "AssumeRolePolicyDocument": {
        "Statement": [
          {
            "Action": [
              "sts:AssumeRole"
            ],
            "Effect": "Allow",
            "Principal": {
              "Service": [
                "eks.amazonaws.com"
              ]
            }
          }
        ],
        "Version": "2012-10-17"
      },
      "ManagedPolicyArns": [
        "arn:aws:iam::aws:policy/AmazonEKSServicePolicy",
        "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
      ]
    }
  },
  "SubnetPrivateUSEAST1A": {
    "Type": "AWS::EC2::Subnet",
    "Properties": {
      "AvailabilityZone": "us-east-1a",
      "CidrBlock": "172.20.64.0/19",
      "Tags": [
        {
          "Key": "kubernetes.io/role/internal-elb",
          "Value": "1"
        },
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/SubnetPrivateUSEAST1A"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "SubnetPrivateUSEAST1B": {
    "Type": "AWS::EC2::Subnet",
    "Properties": {
      "AvailabilityZone": "us-east-1b",
      "CidrBlock": "172.20.96.0/19",
      "Tags": [
        {
          "Key": "kubernetes.io/role/internal-elb",
          "Value": "1"
        },
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/SubnetPrivateUSEAST1B"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "SubnetPublicUSEAST1A": {
    "Type": "AWS::EC2::Subnet",
    "Properties": {
      "AvailabilityZone": "us-east-1a",
      "CidrBlock": "172.20.0.0/19",
      "Tags": [
        {
          "Key": "kubernetes.io/role/elb",
          "Value": "1"
        },
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/SubnetPublicUSEAST1A"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "SubnetPublicUSEAST1B": {
    "Type": "AWS::EC2::Subnet",
    "Properties": {
      "AvailabilityZone": "us-east-1b",
      "CidrBlock": "172.20.32.0/19",
      "Tags": [
        {
          "Key": "kubernetes.io/role/elb",
          "Value": "1"
        },
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/SubnetPublicUSEAST1B"
          }
        }
      ],
      "VpcId": {
        "Ref": "VPC"
      }
    }
  },
  "VPC": {
    "Type": "AWS::EC2::VPC",
    "Properties": {
      "CidrBlock": "172.20.0.0/16",
      "EnableDnsHostnames": true,
      "EnableDnsSupport": true,
      "Tags": [
        {
          "Key": "Name",
          "Value": {
            "Fn::Sub": "${AWS::StackName}/VPC"
          }
        }
      ]
    }
  },
  "VPCGatewayAttachment": {
    "Type": "AWS::EC2::VPCGatewayAttachment",
    "Properties": {
      "InternetGatewayId": {
        "Ref": "InternetGateway"
      },
      "VpcId": {
        "Ref": "VPC"
      }
    }
  }
}

@errordeveloper
Copy link
Contributor

So there are certainly 3x public and 3x private subnets, along with Internet and NAT gateways. Could this be related to #605 and/or somehow specific to your AWS account (if you have direct connect or something of that kind?).

@jcoletaylor
Copy link
Author

Hey, thanks for the detailed response! I looked at #605 and I don't know if it was related specifically, since eksctl created the entire VPC and routes those few times. I have it working now, but I explicitly crafted my subnets per vpc, giving and set up the nat and igw beforehand, then launched the same basic config file with no private node workers, and it came up fine. I honestly don't know :\

@errordeveloper
Copy link
Contributor

errordeveloper commented Mar 14, 2019

So I've managed to reproduce it, and it turns out ami-06fd8200ac0eb656d is the cuprit - it's an Ubuntu one and it's for 1.10 (not 1.11), also one must specify amiFamily: Ubuntu1804 for Ubuntu node bootstrap to function correctly (we assume Amazon Linux 2 by default).

I'll re-test with default AMI before I can confirm if this is a bug or not.

@errordeveloper errordeveloper changed the title Private Subnet Worker Nodes not attaching nodegroup with custom AMI doesn't join the cluster Mar 14, 2019
@errordeveloper
Copy link
Contributor

I can confirm that removing ami: ami-06fd8200ac0eb656d from the config fixes this. I opened #637 to improve UX.

@errordeveloper
Copy link
Contributor

errordeveloper commented Mar 14, 2019

To be clear, if you must use Ubuntu, you'd have to use this:

apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig

metadata:
  name: staging
  region: us-east-1
  version: 1.10
  tags: 
    environment: staging
    creator: eksctl

vpc:
  cidr: "172.20.0.0/16"

nodeGroups:
  - name: ng-1-workers
    labels:
      role: workers
      nodegroup-type: backend-api-workers
    iam:
      withAddonPolicies:
        autoScaler: true
    instanceType: t3.medium
    desiredCapacity: 2
    privateNetworking: true
    amiFamily: Ubuntu1804
    # ami: ami-06fd8200ac0eb656d  # (optional)
    allowSSH: true
    sshPublicKeyPath: /Users/petetaylor/.ssh/aws_stag_vpc.pub

availabilityZones: ["us-east-1a", "us-east-1b"]

@eatonphil
Copy link

eatonphil commented Oct 1, 2019

I experienced the same thing (hanging indefinitely waiting for the nodegroup node to become available) using a CentOS worker node AMI built from this repo following this guide.

Any suggestions?

@jcoletaylor
Copy link
Author

What I think we discovered was that the AMI was built for a different K8s version and so would not allow itself to be deployed on a newer version, breaking the whole install.

@ranjithwingrider
Copy link

I also faced same issue and I was using eksctl with parameters in command line. I have added --node-ami-family=Ubuntu1804 with supported Ubuntu AMI belongs to specific region. For example I used EKS cluster creation in asia-pacific-mubai region and I got the supported ubuntu image from https://cloud-images.ubuntu.com/aws-eks/. Then I have mentioned --node-ami-family=Ubuntu1804 in the command.
Example command:

eksctl create cluster \
 --name test-eks \
 --version 1.14 \
 --region ap-south-1 \
 --nodegroup-name standard-workers \
 --node-type t3.medium \
 --nodes 1 \
 --nodes-min 1 \
 --nodes-max 1 \
 --node-ami ami-0ebcd0d7eeb363724 \
 --ssh-access=true \
 --ssh-public-key=test-eks-key-pair \
 --node-ami-family=Ubuntu1804

@eatonphil
Copy link

I eventually found the issues by ssh-ing into the worker nodes and examining the kubelet logs. My particular issue was that some docker images present were corrupted so I had to remove all docker images and restart docker and the kubelet and things worked.

I have no reason to believe this is a common scenario, so the underlying method is to just ssh into a worker node and look for logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants