Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HELP WANTED] All Pending #943

Closed
warf34 opened this issue Dec 2, 2019 · 13 comments
Closed

[HELP WANTED] All Pending #943

warf34 opened this issue Dec 2, 2019 · 13 comments

Comments

@warf34
Copy link

warf34 commented Dec 2, 2019

hello, i want use katib not using minikube

i installed kubernetes like this
and downloaded like this
git clone https://github.com/kubeflow/katib
and I entered this command
bash ./katib/scripts/v1alpha3/deploy.sh

but it showed this message
error: unable to recognize "manifests/v1alpha3/0-namespace.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused

so, i installed kubernetes version 1.15.0
like this
sudo apt-get install -y kubelet=1.15.0-00 kubeadm=1.15.0-00 kubectl=1.15.0-00
and I entered this command
bash ./katib/scripts/v1alpha3/deploy.sh and kubectl get pods --all-namespaces
it showed two error and Pending status

+ kubectl get validatingwebhookconfigurations katib-validating-webhook-config
Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "katib-validating-webhook-config" not found
+ kubectl get mutatingwebhookconfigurations katib-mutating-webhook-config
Error from server (NotFound): mutatingwebhookconfigurations.admissionregistration.k8s.io "katib-mutating-webhook-config" not found
kube-system   calico-kube-controllers-59fc8847c-nz8gm   1/1     Running   0          102m
kube-system   calico-node-z7wmr                         1/1     Running   0          102m
kube-system   coredns-5c98db65d4-gdfjx                  1/1     Running   0          103m
kube-system   coredns-5c98db65d4-rhwcp                  1/1     Running   0          103m
kube-system   etcd-user-desktop                         1/1     Running   0          102m
kube-system   kube-apiserver-user-desktop               1/1     Running   0          102m
kube-system   kube-controller-manager-user-desktop      1/1     Running   0          102m
kube-system   kube-proxy-mhhb6                          1/1     Running   0          103m
kube-system   kube-scheduler-user-desktop               1/1     Running   0          103m
kubeflow      katib-controller-5d8984cc7b-8rd2m         0/1     Pending   0          81m
kubeflow      katib-db-7f9cddf68-8wjjs                  0/1     Pending   0          81m
kubeflow      katib-manager-7b9fdcd46d-rgc4t            0/1     Pending   0          81m
kubeflow      katib-ui-65c94bc47f-xfc67                 0/1     Pending   0          81m

I also entered v1alpha2.
bash ./katib/scripts/v1alpha2/deploy.sh
but it showed Pending status...

kube-system   calico-kube-controllers-59fc8847c-nz8gm                 1/1     Running   0          141m
kube-system   calico-node-z7wmr                                       1/1     Running   0          141m
kube-system   coredns-5c98db65d4-gdfjx                                1/1     Running   0          141m
kube-system   coredns-5c98db65d4-rhwcp                                1/1     Running   0          141m
kube-system   etcd-user-desktop                                       1/1     Running   0          141m
kube-system   kube-apiserver-user-desktop                             1/1     Running   0          141m
kube-system   kube-controller-manager-user-desktop                    1/1     Running   0          141m
kube-system   kube-proxy-mhhb6                                        1/1     Running   0          141m
kube-system   kube-scheduler-user-desktop                             1/1     Running   0          141m
kubeflow      katib-controller-7f5b49d599-w5m87           0/1     Pending   0          19m
kubeflow      katib-db-b48df7777-6znrp                                0/1     Pending   0          19m
kubeflow      katib-manager-7946dd5984-ddl4n                          0/1     Pending   0          19m
kubeflow      katib-manager-rest-647f694b7d-t9hjp                     0/1     Pending   0          19m
kubeflow      katib-suggestion-bayesianoptimization-94c87dd64-4twq4   0/1     Pending   0          19m
kubeflow      katib-suggestion-grid-58d9dfb5fd-8xf68                  0/1     Pending   0          19m
kubeflow      katib-suggestion-hyperband-778bb768c8-7xrsc             0/1     Pending   0          19m
kubeflow      katib-suggestion-nasrl-d84fbb8f4-9gfgm                  0/1     Pending   0          19m
kubeflow      katib-suggestion-random-7f96c4d77b-ngfvr                0/1     Pending   0          19m
kubeflow      katib-ui-6cf97db464-26gxz                               0/1     Pending   0          19m

help me.. how to i solve this problem?

@gaocegege
Copy link
Member

@hougangliu Seems that the get confused users here.

@warf34 Hi, I am glad to help you to deploy katib. The two errors about webhook do not affect the workflow. Can you use kubectl describe to get more info about the pending pods?

@gaocegege
Copy link
Member

/kind question

@warf34
Copy link
Author

warf34 commented Dec 2, 2019

@hougangliu Seems that the get confused users here.

@warf34 Hi, I am glad to help you to deploy katib. The two errors about webhook do not affect the workflow. Can you use kubectl describe to get more info about the pending pods?

i was redeploy bash ./katib/scripts/v1alpha2/deploy.sh
and
entered command: kubectl describe pods katib-controller-7f5b49d599-fpb7d
result: Error from server (NotFound): pods "katib-controller-7f5b49d599-fpb7d" not found

@gaocegege
Copy link
Member

We install the katib in kubeflow namespace, thus you need to add -n kubeflow:

kubectl describe pods katib-controller-7f5b49d599-fpb7d -n kubeflow

@warf34
Copy link
Author

warf34 commented Dec 2, 2019

We install the katib in kubeflow namespace, thus you need to add -n kubeflow:

kubectl describe pods katib-controller-7f5b49d599-fpb7d -n kubeflow
Name:           katib-controller-7f5b49d599-fpb7d
Namespace:      kubeflow
Priority:       0
Node:           <none>
Labels:         app=katib-controller
                pod-template-hash=7f5b49d599
Annotations:    prometheus.io/scrape: true
Status:         Pending
IP:             
Controlled By:  ReplicaSet/katib-controller-7f5b49d599
Containers:
  katib-controller:
    Image:       gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller
    Ports:       443/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      ./katib-controller
    Environment:
      KATIB_CORE_NAMESPACE:  kubeflow (v1:metadata.namespace)
    Mounts:
      /tmp/cert from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from katib-controller-token-xxn7t (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  katib-controller
    Optional:    false
  katib-controller-token-xxn7t:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  katib-controller-token-xxn7t
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  48s (x8 over 8m8s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

@gaocegege
Copy link
Member

  Warning  FailedScheduling  48s (x8 over 8m8s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

It seems that your node is not ready or has some taints.

@warf34
Copy link
Author

warf34 commented Dec 2, 2019

  Warning  FailedScheduling  48s (x8 over 8m8s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

It seems that your node is not ready or has some taints.

node? how to solve problem?
Do I need to reinstall kubernetes?

@gaocegege
Copy link
Member

You could kubectl describe nodes to have more details

@warf34
Copy link
Author

warf34 commented Dec 2, 2019

You could kubectl describe nodes to have more details

like this?

Name:               user-desktop
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=user-desktop
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.0.25/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 192.168.167.192
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 02 Dec 2019 17:49:44 +0900
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 02 Dec 2019 18:01:43 +0900   Mon, 02 Dec 2019 18:01:43 +0900   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 02 Dec 2019 18:38:57 +0900   Mon, 02 Dec 2019 17:49:42 +0900   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 02 Dec 2019 18:38:57 +0900   Mon, 02 Dec 2019 17:49:42 +0900   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 02 Dec 2019 18:38:57 +0900   Mon, 02 Dec 2019 17:49:42 +0900   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 02 Dec 2019 18:38:57 +0900   Mon, 02 Dec 2019 18:01:55 +0900   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.0.25
  Hostname:    user-desktop
Capacity:
 cpu:                8
 ephemeral-storage:  244568380Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             16311252Ki
 pods:               110
Allocatable:
 cpu:                8
 ephemeral-storage:  225394218635
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             16208852Ki
 pods:               110
System Info:
 Machine ID:                 ec48a0a7855946b3ba5ace414b2e6780
 System UUID:                35c28570-6137-0000-0000-000000000000
 Boot ID:                    f3467e34-c003-4c1b-a141-5a573e9fe4b1
 Kernel Version:             5.0.0-36-generic
 OS Image:                   Ubuntu 18.04.3 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.6.2
 Kubelet Version:            v1.15.6
 Kube-Proxy Version:         v1.15.6
PodCIDR:                     10.244.0.0/24
Non-terminated Pods:         (9 in total)
  Namespace                  Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                calico-kube-controllers-59fc8847c-9t2xk    0 (0%)        0 (0%)      0 (0%)           0 (0%)         37m
  kube-system                calico-node-bshdl                          250m (3%)     0 (0%)      0 (0%)           0 (0%)         37m
  kube-system                coredns-5c98db65d4-252hx                   100m (1%)     0 (0%)      70Mi (0%)        170Mi (1%)     49m
  kube-system                coredns-5c98db65d4-gnsdb                   100m (1%)     0 (0%)      70Mi (0%)        170Mi (1%)     49m
  kube-system                etcd-user-desktop                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         48m
  kube-system                kube-apiserver-user-desktop                250m (3%)     0 (0%)      0 (0%)           0 (0%)         48m
  kube-system                kube-controller-manager-user-desktop       200m (2%)     0 (0%)      0 (0%)           0 (0%)         48m
  kube-system                kube-proxy-sfmmm                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         49m
  kube-system                kube-scheduler-user-desktop                100m (1%)     0 (0%)      0 (0%)           0 (0%)         48m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                1 (12%)     0 (0%)
  memory             140Mi (0%)  340Mi (2%)
  ephemeral-storage  0 (0%)      0 (0%)
Events:
  Type    Reason                   Age                From                      Message
  ----    ------                   ----               ----                      -------
  Normal  NodeHasSufficientMemory  49m (x8 over 49m)  kubelet, user-desktop     Node user-desktop status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    49m (x8 over 49m)  kubelet, user-desktop     Node user-desktop status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     49m (x7 over 49m)  kubelet, user-desktop     Node user-desktop status is now: NodeHasSufficientPID
  Normal  Starting                 49m                kube-proxy, user-desktop  Starting kube-proxy.
  Normal  NodeReady                37m                kubelet, user-desktop     Node user-desktop status is now: NodeReady

@gaocegege
Copy link
Member

Taints: node-role.kubernetes.io/master:NoSchedule

Node has a taint.

@warf34
Copy link
Author

warf34 commented Dec 2, 2019

Taints: node-role.kubernetes.io/master:NoSchedule

Node has a taint.

how to solve?

@yeya24
Copy link
Contributor

yeya24 commented Dec 2, 2019

kubectl taint node "your master node name" node-role.kubernetes.io/master-

@andreyvelich
Copy link
Member

I close this issue, feel free to re-open if you have any other questions.
Related issue: #1174.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants