Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

kubelet doesn't attempt to start api server #19

Closed
dimm0 opened this issue Jan 13, 2017 · 7 comments
Closed

kubelet doesn't attempt to start api server #19

dimm0 opened this issue Jan 13, 2017 · 7 comments

Comments

@dimm0
Copy link

dimm0 commented Jan 13, 2017

Tectonic Version

1.4.7

Environment

Baremetal, CoreOS VMs in KVM

Expected Behavior

The kubelet service on controller node is expected to start the api-server

Actual Behavior

Followed all the steps in the instruction. The api-server hasn't been started.

Lots of errors in the log:

vct06.sdsc.edu&resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused
Jan 13 01:44:19 controller-vct06.sdsc.edu kubelet-wrapper[2385]: E0113 01:44:19.768794    2385 reflector.go:203] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: Get https://controllers-vct06.sdsc.edu:443/api/v1/pods?fieldSelector=spec.nodeName%3Dcontroller-vct06.sdsc.edu&resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused
Jan 13 01:44:19 controller-vct06.sdsc.edu kubelet-wrapper[2385]: E0113 01:44:19.768875    2385 reflector.go:203] pkg/kubelet/kubelet.go:384: Failed to list *api.Service: Get https://controllers-vct06.sdsc.edu:443/api/v1/services?resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused
Jan 13 01:44:20 controller-vct06.sdsc.edu kubelet-wrapper[2385]: E0113 01:44:20.770320    2385 reflector.go:203] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: Get https://controllers-vct06.sdsc.edu:443/api/v1/pods?fieldSelector=spec.nodeName%3Dcontroller-vct06.sdsc.edu&resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused
Jan 13 01:44:20 controller-vct06.sdsc.edu kubelet-wrapper[2385]: E0113 01:44:20.770326    2385 reflector.go:203] pkg/kubelet/kubelet.go:384: Failed to list *api.Service: Get https://controllers-vct06.sdsc.edu:443/api/v1/services?resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused
Jan 13 01:44:20 controller-vct06.sdsc.edu kubelet-wrapper[2385]: E0113 01:44:20.770359    2385 reflector.go:203] pkg/kubelet/kubelet.go:403: Failed to list *api.Node: Get https://controllers-vct06.sdsc.edu:443/api/v1/nodes?fieldSelector=metadata.name%3Dcontroller-vct06.sdsc.edu&resourceVersion=0: dial tcp 192.168.0.2:443: getsockopt: connection refused

kubelet.service:

[Unit]
Description=Kubelet via Hyperkube ACI
Wants=flanneld.service
[Service]
Environment="RKT_OPTS=--uuid-file-save=/var/run/kubelet-pod.uuid \
  --volume=resolv,kind=host,source=/etc/resolv.conf \
  --mount volume=resolv,target=/etc/resolv.conf \
  --volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log"
EnvironmentFile=/etc/kubernetes/kubelet.env
ExecStartPre=/usr/bin/systemctl is-active flanneld.service
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /srv/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
  --api-servers=https://controllers-vct06.sdsc.edu:443 \
  --kubeconfig=/etc/kubernetes/kubeconfig \
  --lock-file=/var/run/lock/kubelet.lock \
  --exit-on-lock-contention \
  --pod-manifest-path=/etc/kubernetes/manifests \
  --allow-privileged \
  --hostname-override=controller-vct06.sdsc.edu \
  --node-labels=master=true \
  --minimum-container-ttl-duration=6m0s \
  --cluster_dns=10.3.0.10 \
  --cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

/etc/kubernetes/manifests/kube-apiserver.yaml doesn't exist on the node.
Probably related to coreos/coreos-kubernetes#626, but resolv.conf is already added to the config file.

Also the installer doesn't see the etcd service on the controller, although it's running

controller-vct06 core # systemctl status etcd2
● etcd2.service - etcd2
   Loaded: loaded (/usr/lib/systemd/system/etcd2.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/etcd2.service.d
           └─40-etcd-cluster.conf
   Active: active (running) since Fri 2017-01-13 01:22:21 UTC; 29min ago
 Main PID: 2095 (etcd2)
    Tasks: 30
   Memory: 22.5M
      CPU: 5.077s
   CGroup: /system.slice/etcd2.service
           └─2095 /usr/bin/etcd2

Thanks for your help!

@naphthalene
Copy link

I am running into this issue also - would love some pointers

@eugene-chow
Copy link

eugene-chow commented Jan 17, 2017

This is because kube-apiserver isn't running. See http://stackoverflow.com/questions/33482418/kubernetes-api-pod-refuses-connection.

At the 2nd last step of Tectonic's installation Connect Nodes, copy the contents of assets.zip to the controller (which has the kube-apiserver.yaml manifest among others). When you run bootkube in this step, you should see this:

[58864.729983] bootkube[5]: 	Pod Status:        pod-checkpointer	Running
[58864.730651] bootkube[5]: 	Pod Status:          kube-apiserver	Running
[58864.731117] bootkube[5]: 	Pod Status:          kube-scheduler	Running
[58864.731513] bootkube[5]: 	Pod Status: kube-controller-manager	Running
[58864.732169] bootkube[5]: All self-hosted control plane components successfully started
Waiting for Kubernetes API...
Waiting for Kubernetes API...
{
  "major": "1",
  "minor": "4",
  "gitVersion": "v1.4.7+coreos.0",
  "gitCommit": "0581d1a5c618b404bd4766544bec479aedef763e",
  "gitTreeState": "clean",
  "buildDate": "2016-12-12T19:04:11Z",
  "goVersion": "go1.6.3",
  "compiler": "gc",
  "platform": "linux/amd64"
}
Waiting for Kubernetes components...

Creating Heapster
Creating Tectonic Namespace
Creating Initial Roles
Creating Tectonic ConfigMap
Creating Tectonic Secrets
Creating Tectonic Identity
Creating Tectonic Console
Creating Tectonic Monitoring
Creating Ingress
Creating Tectonic Stats Emitter

For more info on how the various services start up, take a look at the instructions for a hands-on install https://coreos.com/kubernetes/docs/latest/deploy-master.html.

@naphthalene
Copy link

I figured this out yesterday too - it seems the installer gets into a state where it never shows you the assets installation/bootkube step. Doing it while its waiting for the control plane to start is safe and worked for me.

@mfburnett
Copy link
Contributor

hey @dimm0, @eugene-chow, and @naphthalene - thanks for filing and working through this issue together. I'll make sure to pass this along to our docs team to add as a troubleshooting solution for bare metal.

@dghubble
Copy link
Member

The kubelet service on controller node is expected to start the api-server

Nodes boot from disk and start etcd2 and the kubelet. The api-server is not expected to be running at this initial stage (a status message erroneously claimed that's what the installer was waiting on). We've fixed this messaging for the next release so expectations are correct (i.e. waiting for kubelet). The Kubernetes control plane is bootstrapped from the next screen Connect Nodes, as @eugene-chow noted.

@dimm0
Copy link
Author

dimm0 commented Feb 10, 2017

I was able to go past this point by running assets/bootkube-start from the assets archive on the controller node. After that got stuck with lots of messages like this:

Feb 07 03:32:52 controller-k8s.sdsc.edu dockerd[1298]: time="2017-02-07T03:32:52.753134342Z" level=error msg="Handler for GET /containers/bac76c9fc8e4f5db5c1733153a111d3e6d5cd06e581ac4ee46ea596774eabca8/json returned error: No such container: bac76c9fc8e4f5db5c1733153a111d3e6d5cd06e581ac4ee46ea596774eabca8"

The installer is still waiting for services to appear.

@dimm0
Copy link
Author

dimm0 commented Mar 8, 2017

All fixed. My problem was the installer node should see the controller too, not only the provision node.

@dimm0 dimm0 closed this as completed Mar 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants