-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to deploy EKS-A to vSphere cluster #7954
Comments
I would check the |
@Darth-Weider thanks for the reply. For those which are having similar issues here is a TL;DR;:
I've just finally figured out what is going on here. There were a few things that weren't really clear when reading the docs:
|
The fun fact is that this is not consistent. I've created the same config multiple times on the same environment and sometimes the process fail in the end with Here is the config: apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
name: awsemu
spec:
clusterNetwork:
cniConfig:
cilium: {}
pods:
cidrBlocks:
- 172.18.0.0/16
services:
cidrBlocks:
- 10.96.0.0/12
controlPlaneConfiguration:
count: 3
endpoint:
host: "172.16.1.1"
machineGroupRef:
kind: VSphereMachineConfig
name: awsemu-cp
datacenterRef:
kind: VSphereDatacenterConfig
name: datacenter
externalEtcdConfiguration:
count: 3
machineGroupRef:
kind: VSphereMachineConfig
name: awsemu-etcd
kubernetesVersion: "1.29"
managementCluster:
name: awsemu
workerNodeGroupConfigurations:
- count: 1
machineGroupRef:
kind: VSphereMachineConfig
name: awsemu
name: md-0
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereDatacenterConfig
metadata:
name: datacenter
spec:
datacenter: datacenter
insecure: false
network: workload
server: 192.168.8.12
thumbprint: "27:44:A2:74:89:B4:D3:4E:97:30:D7:AF:3B:88:06:F4:08:0C:4F:D7"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
name: awsemu-cp
spec:
cloneMode: linkedClone
datastore: vsandatastore
folder: Kubernetes/Management/Control Plane
memoryMiB: 8192
numCPUs: 2
osFamily: bottlerocket
resourcePool: /datacenter/host/hwcluster/Resources
storagePolicyName: ""
users:
- name: ec2-user
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGidVzdPHSLPNq7i4+r1AD2bfAQmEC8NmZM1V0vN7jMIW2QZSflL2LrCpGk0969FHesOUTM1x61B5oYepsLjYgSKDC2mNxIg2jZONPYCg30fxE5vOxWUJObCGuc4trKfz9DLPx7+C3fGgXQaFmnugMgRbqYurdrr8HDeXsavwN361x/MesKpY4E26SBt/RG/sZEssVnzeIPbM8S9LDOX62znFYIXRlgmmx9un68TqQpMti6CnIWUlYwx90MJkV0avL5BeSg9ex3JxYH1THQw3tcj5gyh9GY9yWVxXA7bs3wh5vd8JAJEtPpeqaafRaqXfBFWzC3/L21GxVCwgvGAjovhdDGk3vn6PNRKf4b1MydHnVK7/lZnpNpenDYCszSEebkS5joqehpkaJ4eED1ACvJeh/0urupu47RMN6DcwLUR7j3o7sxcXZK31lecgogC7yvC5eZGK/B6rwHyV3xX7KaVcfabJJeiiJgrb2cKesiKDFgR8DlQ+sUrdwUIcsxsoOskYZJQuvH/h2Gi7lZv71uABnQLvcAeF6OSj7vnrsQ7oUKdcJhAfoRdJCOEt1PtgyDfe2WJ9gH3KRbuHxnNVyQKNZaI5OtEPCxlPIyXbGQnsTwZ1AiWj/RYbj3DP3aCM3Iu7Lg7z/dVGSnRfWJk0zdcZekGch0O43H0EX7611kQ==
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
name: awsemu
spec:
cloneMode: linkedClone
datastore: vsandatastore
folder: Kubernetes/Management/Worker Nodes
memoryMiB: 8192
numCPUs: 2
osFamily: bottlerocket
resourcePool: /datacenter/host/hwcluster/Resources
storagePolicyName: ""
users:
- name: ec2-user
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGidVzdPHSLPNq7i4+r1AD2bfAQmEC8NmZM1V0vN7jMIW2QZSflL2LrCpGk0969FHesOUTM1x61B5oYepsLjYgSKDC2mNxIg2jZONPYCg30fxE5vOxWUJObCGuc4trKfz9DLPx7+C3fGgXQaFmnugMgRbqYurdrr8HDeXsavwN361x/MesKpY4E26SBt/RG/sZEssVnzeIPbM8S9LDOX62znFYIXRlgmmx9un68TqQpMti6CnIWUlYwx90MJkV0avL5BeSg9ex3JxYH1THQw3tcj5gyh9GY9yWVxXA7bs3wh5vd8JAJEtPpeqaafRaqXfBFWzC3/L21GxVCwgvGAjovhdDGk3vn6PNRKf4b1MydHnVK7/lZnpNpenDYCszSEebkS5joqehpkaJ4eED1ACvJeh/0urupu47RMN6DcwLUR7j3o7sxcXZK31lecgogC7yvC5eZGK/B6rwHyV3xX7KaVcfabJJeiiJgrb2cKesiKDFgR8DlQ+sUrdwUIcsxsoOskYZJQuvH/h2Gi7lZv71uABnQLvcAeF6OSj7vnrsQ7oUKdcJhAfoRdJCOEt1PtgyDfe2WJ9gH3KRbuHxnNVyQKNZaI5OtEPCxlPIyXbGQnsTwZ1AiWj/RYbj3DP3aCM3Iu7Lg7z/dVGSnRfWJk0zdcZekGch0O43H0EX7611kQ==
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
name: awsemu-etcd
spec:
cloneMode: linkedClone
datastore: vsandatastore
folder: Kubernetes/Management/ETCD
memoryMiB: 8192
numCPUs: 2
osFamily: bottlerocket
resourcePool: /datacenter/host/hwcluster/Resources
storagePolicyName: ""
users:
- name: ec2-user
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDGidVzdPHSLPNq7i4+r1AD2bfAQmEC8NmZM1V0vN7jMIW2QZSflL2LrCpGk0969FHesOUTM1x61B5oYepsLjYgSKDC2mNxIg2jZONPYCg30fxE5vOxWUJObCGuc4trKfz9DLPx7+C3fGgXQaFmnugMgRbqYurdrr8HDeXsavwN361x/MesKpY4E26SBt/RG/sZEssVnzeIPbM8S9LDOX62znFYIXRlgmmx9un68TqQpMti6CnIWUlYwx90MJkV0avL5BeSg9ex3JxYH1THQw3tcj5gyh9GY9yWVxXA7bs3wh5vd8JAJEtPpeqaafRaqXfBFWzC3/L21GxVCwgvGAjovhdDGk3vn6PNRKf4b1MydHnVK7/lZnpNpenDYCszSEebkS5joqehpkaJ4eED1ACvJeh/0urupu47RMN6DcwLUR7j3o7sxcXZK31lecgogC7yvC5eZGK/B6rwHyV3xX7KaVcfabJJeiiJgrb2cKesiKDFgR8DlQ+sUrdwUIcsxsoOskYZJQuvH/h2Gi7lZv71uABnQLvcAeF6OSj7vnrsQ7oUKdcJhAfoRdJCOEt1PtgyDfe2WJ9gH3KRbuHxnNVyQKNZaI5OtEPCxlPIyXbGQnsTwZ1AiWj/RYbj3DP3aCM3Iu7Lg7z/dVGSnRfWJk0zdcZekGch0O43H0EX7611kQ==
--- This also leaves all VMs created behind and the cluster in a state that it isn't ready nor can I delete with eksctl so all we can do is to manually stop and delete each VM... |
galvesribeiro Can you try fullclone instead linkedclone ? Also the CP node ip address is set to "172.16.1.1" ? Is it your vlan gateway IP ? And does your EKS-A vlan have access to your vCenter API endpoint ? I |
Full clone is what was causing vSphere to fail with that message as you see the picture (A specified parameter was not correct: spec.config.deviceChange[0].operation). I was only able to make it pass thru it and deploy the VMs with
No. The network is:
Yep. vCenter is 192.168.8.12 which is routable thru the 172.16.0.1 gateway. |
@galvesribeiro I believe the problem you had with I posted my results over in #8123 (comment) where the poster ran into the exact same error. Pretty sure there's a race condition toward the end of the entire cluster creation process. It's supposed to write the kubeconfig for your new EKSA cluster to your local filesystem, then start using that kubeconfig to connect to the new cluster and lift all the EKSA controllers and CRDs into it. The very first action it takes to do that is to make the If the kubeconfig file doesn't exist yet, that command will fail--and because there is no kubeconfig set, it will fall back to the kubectl default which is to connect to a local K8s cluster API at Sometimes when I ran it the kubeconfig would be there first, and it would succeed. Other times it wouldn't, and it would bomb with this error. That's why it's inconsistent. The way I solved it was to fork the EKSA plugin and add a 30s sleep before it tries to run commands against the new cluster 😏 . That has worked for me every time since. Another poster on that ticket suggested ensuring that the k8s versions match exactly between kubectl and the cluster, which I haven't yet tested (all my clusters are working). However you could try that as well. Hope that helps! |
Interesting idea. The thing is that consistently changing from full cone to linked clone made it work. If the delay as you suggest is the culprit, I'd guess it should be included among many other "await" that happens across the steps. The overall feeling is that EKS-A is not been given much attention and/or maintained unfortunately if this issue is out for that long. But thanks for the reply, I'll have a look on forking. |
Hey folks! Just to remember the team, this still an issue. Today we had a need to increase the /tmp directory from bottleneck VM disk. We aren't able to do it on running nodes, so we had thought "lets set the So the logical action was "yes, it will be a full copy of the disk but at least we can use the bigger disk. Let's change to I wish EKS team would have a look on this. This issue is from April and we have not got even a reply on what is wrong or any proper workaround yet... |
What happened:
Unable to deploy EKS-A on ESXI 8 U1
What you expected to happen:
The initial cluster to be deployed
How to reproduce it (as minimally and precisely as possible):
Just follow the process from the documentation to deploy the initial cluster.
When it tries to deploy the first etcd VM from the templates, the VM is created, but then briefly after creation it is removed and I see the following error:
I've tried with multiple BR versions starting from 1.26 to 1.29 and all of them fail. Also tried on two completely separated ESXI/vSphere clusters with the same results.
Environment:
Latest EKS-A CLI (from brew) on macOS Sonoma (fully updated) deploying to ESXi/vCenter/vSAN 8U1.
The text was updated successfully, but these errors were encountered: