Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant get cluster members #5143

Closed
oussexist opened this issue Jul 5, 2024 · 20 comments
Closed

Cant get cluster members #5143

oussexist opened this issue Jul 5, 2024 · 20 comments
Labels
kind/question Indicates an issue that is a support question.

Comments

@oussexist
Copy link

oussexist commented Jul 5, 2024

Hello there ,
I have an on-prem cluster and a cloud cluster so , if i do the init on the on-prem cluster so it'll be the controlplane host cluster lets say i want to join members with create token command it'll give me the register command and use that command on the cloud cluster , it won't be able to register since the ip adresse is for a local network , althoguht the local cluster have network acces and can ping google.com just fine , what to do ?
Do I need to init on the cloud cluster and join the local one or what ?
even tho if i did init on the cloud cluster and want to join the local cluster , the api server config ( karmada config file ) i should manually move it the local cluster ? and when joining give the path to the file i manually created with the same content as the cloud karmada config inited ?

P.S : I tried karmada init on the local cluster and joined it but when i try to use karmada get cluster command i get this : error: failed to list all member clusters in control plane, err: the server could not find the requested resource (get clusters.cluster.karmada.io)

@RainbowMango
Copy link
Member

but when i try to use karmada get cluster command i get this : error: failed to list all member clusters in control plane, err: the server could not find the requested resource (get clusters.cluster.karmada.io)

Let's get started with this.
This error is usually due to the wrong kubeconfig. You might need to change kubeconfig or context to use the karmada's kubeconfig, not the host cluster's.

@oussexist
Copy link
Author

oussexist commented Jul 7, 2024

Hello there ,
First of all ,thanks for youe reply , so I did the karmada init on the local one cluster so i want it at the same time a member and a control plane this cant be done ?
as the documentations shows all i need to do is kubectl karmada init on one of my one of my clusters isnt it ?
Also i wanted to add something what if my cluster have an internal ip and a public ip the kubeconfig when initializing the cluster will use the local ône so when initializing karmada its using the internal ip even tho by trying to edit karmada config file by changing the internal ip with the public one this will leads to the certif is known only for the internal ip

@RainbowMango
Copy link
Member

cc @chaosi-zju for help

@oussexist
Copy link
Author

oussexist commented Jul 8, 2024

Ok so as you told me i targeted the karmada config using this command :
kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get clusters
and i got the local clusters that i joined before ( docker-desktop cluster , and the host one )

just joining the cloud cluster is the issue that remains ..
my local cluster which is the host have a public ip assigned to its master , but even tho when i initialize karmada on it its using the local ip addr ( used in kubeconfig) , idk how i'll establish the join of the cloud member since its trying to acces always to the local ip..

@chaosi-zju
Copy link
Member

Hi @oussexist, I will answer your above two questions.

First

so I did the karmada init on the local one cluster so i want it at the same time a member and a control plane this cant be done ?

I spent some time today verifying this problem for you, here are two cases:

  • If your local host machine has two different clusters, you can certainly use one cluster as a control plane cluster and the other as a member cluster.
  • If you can only build one cluster on your local host machine, this cluster can still be used as both a control plane cluster and a member cluster, but we generally don't use it this way.

Second

when initializing karmada its using the internal ip even tho by trying to edit karmada config file by changing the internal ip with the public one this will leads to the certif is known only for the internal ip

here is a command option for kubectl karmada or karmadactl, may be can resolve your problem, you can execute kubectl karmada init -h and see:

...
  # Specify external IPs(load balancer or HA IP) which used to sign the certificate
  karmadactl init --cert-external-ip 10.235.1.2 --cert-external-dns www.karmada.io

Options:
    --cert-external-dns='':
        the external DNS of Karmada certificate (e.g localhost,localhost.com)

    --cert-external-ip='':
        the external IP of Karmada certificate (e.g 192.168.1.2,172.16.1.2)
...

So, you can add --cert-external-ip=xx.xx.xx.xx to you kubectl karmada init command, that xx.xx.xx.xx is your public ip.

Then, even though we initialize karmada using local ip, you can modify your ip in kubeconfig to public ip, and with --cert-external-ip=xx.xx.xx.xx option, you won't encounter certificate issues

@oussexist
Copy link
Author

oussexist commented Jul 8, 2024

Hello there ,
I want to thank you for ur time , so i did as u told me with the cert-external-ip flag and yeah the problem of ceritification is gone , but it still timesout cuz it still pings the local ip-addr.
i tried to join it with both Pull and push mode and i couldn't with both cuz of network issue between the 2 clusters.

  • using join command ( moved the karmada api config file to the cloud cluster ) : Unable to connect to the server: dial tcp xxx.x: i/o timeout
  • using register command it still stucks on : [karmada-agent-start] Waiting to perform the TLS Bootstrap

PS: Although , i tried to initialize the cluster with the public addr as an endpoint and its getting it in the kubeconfig file but the karmada always uses the internal one example :

 kubectl karmada init
I0708 10:09:21.998604    3797 deploy.go:250] kubeconfig file: , kubernetes: https://publicIP:6443
I0708 10:09:22.204984    3797 deploy.go:270] karmada apiserver ip: [the-local-ip]
.
.
.
.
.

Edited: Well , i found an advertise flag on init command that may be helpfull on that case , I am checking it rn

@oussexist
Copy link
Author

oussexist commented Jul 8, 2024

Hello again @chaosi-zju ,
So i tried to init on the cloud cluster ( i updated the kubeconfig file using the public ip as endpoint and karmada api server uses the same public ip , using this flag : kubectl karmada init --karmada-apiserver-advertise-address=my-public-ip
but its the same result after this step it gets stuck :
I0708 15:49:06.409095 11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated.
and then this error :
error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded

This is the full log :

ubuntu@master:~$ kubectl karmada init --karmada-apiserver-advertise-address=my-public-ip
I0708 15:48:30.430370   11751 deploy.go:250] kubeconfig file: , kubernetes: https://my-public-ip:6443
I0708 15:48:30.500788   11751 deploy.go:270] karmada apiserver ip: [my-public-ip]
I0708 15:48:33.577406   11751 cert.go:246] Generate ca certificate success.
I0708 15:48:35.266299   11751 cert.go:246] Generate karmada certificate success.
I0708 15:48:36.167317   11751 cert.go:246] Generate apiserver certificate success.
I0708 15:48:39.011770   11751 cert.go:246] Generate front-proxy-ca certificate success.
I0708 15:48:40.025761   11751 cert.go:246] Generate front-proxy-client certificate success.
I0708 15:48:43.695485   11751 cert.go:246] Generate etcd-ca certificate success.
I0708 15:48:44.897303   11751 cert.go:246] Generate etcd-server certificate success.
I0708 15:48:49.582634   11751 cert.go:246] Generate etcd-client certificate success.
I0708 15:48:49.582991   11751 deploy.go:366] download crds file:https://github.com/karmada-io/karmada/releases/download/v1.10.2/crds.tar.gz
Downloading...[ 100.00% ]
Download complete.
I0708 15:48:50.143988   11751 deploy.go:608] Create karmada kubeconfig success.
I0708 15:48:50.183226   11751 idempotency.go:267] Namespace karmada-system has been created or updated.
I0708 15:48:50.311282   11751 idempotency.go:291] Service karmada-system/etcd has been created or updated.
I0708 15:48:50.311310   11751 deploy.go:432] Create etcd StatefulSets
I0708 15:49:06.346320   11751 deploy.go:441] Create karmada ApiServer Deployment
I0708 15:49:06.409095   11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated.
error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded

@chaosi-zju
Copy link
Member

I0708 15:49:06.409095 11751 idempotency.go:291] Service karmada-system/karmada-apiserver has been created or updated.
error: wait for Deployment(karmada-system/karmada-apiserver) rollout: context deadline exceeded: client rate limiter Wait returned an error: context deadline exceeded

Hi @oussexist, sorry to hear this error.

  1. Can you provide me with current status of karmada-apiserver, like messages from kubectl describe and kubectl logs?
  2. Do you have to use karmadactl init to install, have you tried other installation methods such as karmada-operator or helm?

@oussexist
Copy link
Author

oussexist commented Jul 9, 2024

By the way , this error happens only on cloud cluster although it's nsg is just allowing all so , i dont think it's a security group issue..
I installed karmada thoroughout krew in both local and cloud one with the same way , it works just fine on the local one but not on the cloud one !
here you are :

 kubectl describe deployment karmada-apiserver -n karmada-system
kubectl get pods -n karmada-system -l app=karmada-apiserver
Name:                   karmada-apiserver
Namespace:              karmada-system
CreationTimestamp:      Mon, 08 Jul 2024 21:50:09 +0000
Labels:                 karmada.io/bootstrapping=app-defaults
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=karmada-apiserver
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=karmada-apiserver
  Containers:
   karmada-apiserver:
    Image:      registry.k8s.io/kube-apiserver:v1.27.11
    Port:       5443/TCP
    Host Port:  0/TCP
    Command:
      kube-apiserver
      --allow-privileged=true
      --authorization-mode=Node,RBAC
      --client-ca-file=/etc/karmada/pki/ca.crt
      --enable-bootstrap-token-auth=true
      --etcd-cafile=/etc/karmada/pki/etcd-ca.crt
      --etcd-certfile=/etc/karmada/pki/etcd-client.crt
      --etcd-keyfile=/etc/karmada/pki/etcd-client.key
      --etcd-servers=https://etcd-0.etcd.karmada-system.svc.cluster.local:2379
      --bind-address=0.0.0.0
      --kubelet-client-certificate=/etc/karmada/pki/karmada.crt
      --kubelet-client-key=/etc/karmada/pki/karmada.key
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --disable-admission-plugins=StorageObjectInUseProtection,ServiceAccount
      --runtime-config=
      --apiserver-count=1
      --secure-port=5443
      --service-account-issuer=https://kubernetes.default.svc.cluster.local
      --service-account-key-file=/etc/karmada/pki/karmada.key
      --service-account-signing-key-file=/etc/karmada/pki/karmada.key
      --service-cluster-ip-range=10.96.0.0/12
      --proxy-client-cert-file=/etc/karmada/pki/front-proxy-client.crt
      --proxy-client-key-file=/etc/karmada/pki/front-proxy-client.key
      --requestheader-allowed-names=front-proxy-client
      --requestheader-client-ca-file=/etc/karmada/pki/front-proxy-ca.crt
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User
      --tls-cert-file=/etc/karmada/pki/apiserver.crt
      --tls-private-key-file=/etc/karmada/pki/apiserver.key
      --tls-min-version=VersionTLS13
    Liveness:     http-get https://:5443/livez delay=15s timeout=5s period=30s #success=1 #failure=3
    Readiness:    http-get https://:5443/readyz delay=0s timeout=5s period=30s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/karmada/pki from karmada-cert (ro)
  Volumes:
   karmada-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  karmada-cert
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   karmada-apiserver-56b85d8bd (1/1 replicas created)
Events:          <none>
NAME                                READY   STATUS             RESTARTS          AGE
karmada-apiserver-56b85d8bd-8v5l4   0/1     CrashLoopBackOff   119 (2m26s ago)   9h

@chaosi-zju
Copy link
Member

karmada-apiserver-56b85d8bd-8v5l4 0/1 CrashLoopBackOff 119 (2m26s ago) 9h

use kubectl logs -p parameter can print logs of this pod before it crash, then we can dig into why it crashed~

just like kubectl logs -p karmada-apiserver-56b85d8bd-8v5l4 -n karmada-system

@oussexist
Copy link
Author

kubectl logs -p karmada-apiserver-56b85d8bd-8v5l4 -n karmada-system
Flag --apiserver-count has been deprecated, apiserver-count is deprecated and will be removed in a future version.
I0709 07:51:42.828196       1 server.go:554] external host was not specified, using 10.244.171.72
I0709 07:51:42.829556       1 server.go:166] Version: v1.27.11
I0709 07:51:42.829606       1 server.go:168] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0709 07:51:43.951884       1 shared_informer.go:311] Waiting for caches to sync for node_authorizer
I0709 07:51:44.001634       1 plugins.go:158] Loaded 9 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0709 07:51:44.001652       1 plugins.go:161] Loaded 12 validating admission controller(s) successfully in the following order: LimitRanger,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota.
E0709 07:52:04.012907       1 run.go:74] "command failed" err="context deadline exceeded"

@chaosi-zju
Copy link
Member

Probably same unresolved issue #5105

can you refer to #5105 (comment), and check whether it can give you some help?

@chaosi-zju
Copy link
Member

As you said:

I installed karmada thoroughout krew in both local and cloud one with the same way , it works just fine on the local one but not on the cloud one !

I suspect that there is something to do with the container network of your cloud environment, which makes the karmada-apiserver unable to connect to etcd.

@chaosi-zju
Copy link
Member

Hi @oussexist, so did you come to any new conclusions later?

@oussexist
Copy link
Author

oussexist commented Jul 10, 2024

Hi @chaosi-zju , Sorry i was kind of busy,
Well i think as you said i have some networking issues on my cluster , i'll try to work with an aks cluster , althought i need to find a solution for the old one

@oussexist
Copy link
Author

oussexist commented Jul 11, 2024

Ok so , Hello again and sorry for being a bit late , anyway i initialized karmada on my local cluster and put it with a public ip and assured the 32443 port is accesible tho ( this is kind of important ) , and then after fixing the cloud cluster from network issues it connected fine .
Althought now i'll pass to the propagation thing so even if one of my clusters go down the other holds the deployments until the other one is up , i hope this will be working fine !
PS : as i told u am using only 2 clusters so , the local one is at the same time a controle plane and a member , ( i cant have another cluster as controlplane due to ressource limitation)

@chaosi-zju
Copy link
Member

chaosi-zju commented Jul 12, 2024

and then after fixing the cloud cluster from network issues it connected fine .

Hi, does the network issue refers to run.go:74] "command failed" err="context deadline exceeded"?

If yes, then I'm curious how you fixed this network issue at last, haha

@oussexist
Copy link
Author

oussexist commented Jul 15, 2024

Hello again , I was on holidays .
So i kind of forgot what i did exactly haha , but as i remember i assured that the cloud cluster uses the public ip for the kubeconfig with the advertise flag ( cuz i am initializing the kubernetes cluster with an ansible playbook so in the task of the kubeadm init i added the flag of advertise to give the public ip there because by default it'll get the 10.xx.xx.xx , and also as i told you , the control plane initialized on the local was not accessible from outside since it's local so i assured thats it's accesible and opened the karmada port.
So thats it as i remember haha , anyway i tried the propagation and it finely created the deployement in both clusters , but i am curious about high availability ,what will happen if the local one goes down normally we won't have a high availablity since it's the control plane and also what if the cloud cluster goes down , idk exactly i am kind of not mastering this at all , so should i acces to each deployement seperaly or on the same endpoint or what.. can you please just guide me a bit .

Edited : Also i have a little problem , when i create a deployment through propagation file , and get deployments i found the local deployment is not ready !

@oussexist
Copy link
Author

oussexist commented Jul 15, 2024

and then after fixing the cloud cluster from network issues it connected fine .

Hi, does the network issue refers to run.go:74] "command failed" err="context deadline exceeded"?

If yes, then I'm curious how you fixed this network issue at last, haha

i think by adding the --control-plane-endpoint flag in the kubeadm init command and put an accessible ip there will fix this so it could access outside , but am not sure , cuz i did the init command on the local one and since it worked i didnt try again on cloud one.

@oussexist
Copy link
Author

I think we can Close this issue , the propagation problem was just a lack of concentration from me ,
All i need to do was when applying the deployment and propagation file i should've added the kubeconfig flag of karmada api since am using the cluster a host and a control plane at the same time.
Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants