Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

etcd-operator panics on self-hosted bootkube #851

Closed
janwillies opened this issue Mar 2, 2017 · 4 comments · Fixed by #852
Closed

etcd-operator panics on self-hosted bootkube #851

janwillies opened this issue Mar 2, 2017 · 4 comments · Fixed by #852
Labels

Comments

@janwillies
Copy link

janwillies commented Mar 2, 2017

I'm running a self-hosted bootkube cluster (see kubernetes-retired/bootkube#346) and when trying to scale etcd I ran into the following problems:
Scaling etcd:

kubectl --namespace=kube-system get cluster.etcd kube-etcd -o json > etcd.json && \
vim etcd.json && \
curl -H 'Content-Type: application/json' -X PUT --data @etcd.json http://127.0.0.1:8080/apis/etcd.coreos.com/v1beta1/namespaces/kube-system/clusters/kube-etcd

Output:

{
  "apiVersion": "etcd.coreos.com/v1beta1",
  "kind": "Cluster",
  "metadata": {
    "name": "kube-etcd",
    "namespace": "kube-system",
    "selfLink": "/apis/etcd.coreos.com/v1beta1/namespaces/kube-system/clusters/kube-etcd",
    "uid": "1b3c4d81-feef-11e6-9fc2-0026558252a6",
    "resourceVersion": "96374",
    "creationTimestamp": "2017-03-02T02:22:38Z"
  },
  "spec": {
    "selfHosted": {
      "bootMemberClientEndpoint": "http://10.7.183.59:12379"
    },
    "size": 3,
    "version": "3.1.0"
  },
  "status": {
    "conditions": null,
    "controlPaused": false,
    "currentVersion": "",
    "phase": "Failed",
    "reason": "cluster failed to be created",
    "size": 0,
    "targetVersion": ""
  }
}

etcd-operator log:

time="2017-03-02T15:07:22Z" level=info msg="etcd-operator Version: 0.2.1"
time="2017-03-02T15:07:22Z" level=info msg="Git SHA: ded9a44"
time="2017-03-02T15:07:22Z" level=info msg="Go Version: go1.7.5"
time="2017-03-02T15:07:22Z" level=info msg="Go OS/Arch: linux/amd64" 
time="2017-03-02T15:07:22Z" level=info msg="finding existing clusters..." pkg=controller 
time="2017-03-02T15:07:22Z" level=info msg="ignore failed cluster kube-etcd" pkg=controller
time="2017-03-02T15:07:22Z" level=info msg="starts running from watch version: 96182" pkg=controller 
time="2017-03-02T15:07:22Z" level=info msg="start watching at 96182" pkg=controller 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe8 pc=0x9844b7]

goroutine 76 [running]:
panic(0x15975a0, 0xc42000a050)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).send(0x0, 0xc4205d8340)
        /home/ubuntu/code/golang/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:212 +0x37
github.com/coreos/etcd-operator/pkg/cluster.(*Cluster).Update(0x0, 0xc420136380)
        /home/ubuntu/code/golang/src/github.com/coreos/etcd-operator/pkg/cluster/cluster.go:356 +0x77
github.com/coreos/etcd-operator/pkg/controller.(*Controller).Run.func2(0xc420064300, 0xc4200980a0)
        /home/ubuntu/code/golang/src/github.com/coreos/etcd-operator/pkg/controller/controller.go:167 +0x21e
created by github.com/coreos/etcd-operator/pkg/controller.(*Controller).Run
        /home/ubuntu/code/golang/src/github.com/coreos/etcd-operator/pkg/controller/controller.go:182 +0x315

I'm guessing it's because etcd-operator panics&restarts and can’t find the already-running etcd cluster ("size": 0)

@xiang90 @hongchaodeng

@xiang90
Copy link
Collaborator

xiang90 commented Mar 2, 2017

trying to scale etcd I ran into the following problems:

How did you scale etcd? What requests you sent to etcd operator?

Can you reproduce this issue? Any reproduce steps?

@janwillies
Copy link
Author

I've updated my comment to make it more clear.

In the current cluster state I can reproduce this everytime. Let me setup a new cluster, and see if I can reproduce as well

@janwillies
Copy link
Author

On a new cluster, I killed the operator a few times, scaled up and down more than once but can't reproduce anymore. I'll leave it up to you to close this issue

@xiang90
Copy link
Collaborator

xiang90 commented Mar 2, 2017

@janwillies OK. I think we might hit a race. Just want to make sure it does not happen all the time and kind of confirm my guess. we will get fixed for you soon.

hongchaodeng added a commit to hongchaodeng/etcd-operator that referenced this issue Mar 2, 2017
hongchaodeng added a commit to hongchaodeng/etcd-operator that referenced this issue Mar 2, 2017
hongchaodeng added a commit to hongchaodeng/etcd-operator that referenced this issue Mar 2, 2017
hongchaodeng added a commit to hongchaodeng/etcd-operator that referenced this issue Mar 2, 2017
hongchaodeng added a commit to hongchaodeng/etcd-operator that referenced this issue Mar 2, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants