Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

ETCD key already exists #1622

Closed
ewoutp opened this issue Jun 30, 2016 · 1 comment
Closed

ETCD key already exists #1622

ewoutp opened this issue Jun 30, 2016 · 1 comment

Comments

@ewoutp
Copy link
Contributor

ewoutp commented Jun 30, 2016

When testing the current master version on a cluster where all fleet instances are v0.11.7, I get these errors when starting a (master) version of fleet on one of the machines.

fleetd[4900]: INFO fleetd.go:75: Starting fleetd version v0.13.0-31-geb95cd9
fleetd[4900]: INFO fleetd.go:203: No provided or default config file found - proceeding without
fleetd[4900]: INFO server.go:175: Establishing etcd connectivity
fleetd[4900]: ERROR server.go:192: Server register machine failed: 105: Key already exists (/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203152748]

A little while later I get this panic.

fleetd[4900]: ERROR server.go:192: Server register machine failed: 105: Key already exists (/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203153266]
fleetd[4900]: INFO server.go:188: hrt.Register() success
fleetd[4900]: INFO server.go:198: Starting server components
fleetd[4900]: panic: runtime error: invalid memory address or nil pointer dereference
fleetd[4900]: [signal 0xb code=0x1 addr=0x20 pc=0x4bc024]
fleetd[4900]: goroutine 49 [running]:
fleetd[4900]: panic(0xc06080, 0xc82000a0a0)
fleetd[4900]:         /usr/local/go/src/runtime/panic.go:481 +0x3e6
fleetd[4900]: github.com/coreos/fleet/pkg.(*reconciler).Run.func1(0xc8201cef00, 0xc8201b9560, 0xc8201cf2c0)
fleetd[4900]:         /opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:65 +0x74
fleetd[4900]: created by github.com/coreos/fleet/pkg.(*reconciler).Run
fleetd[4900]:         /opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:69 +0x89
systemd[1]: fleet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: fleet.service: Unit entered failed state.
systemd[1]: fleet.service: Failed with result 'exit-code'.
@dongsupark
Copy link
Contributor

Thanks for the bug report.

First the message like "Server register machine failed" is not always bad. That's normally a good indication of etcd registry that it avoids registration of duplicated machine ID. You are seeing this message in 0.13 because #1561 was merged. Though actually in this case fleetd should print out WARNING instead of ERROR. (minor TODO)

On the other hand, fleetd should not panic like this. So this is indeed a bug. The "pkg/reconcile.go:65" points to this.

            case <-r.eStream.Next(abort):

I'm having trouble with understanding how this line could result in panic. One of the possible scenarios is:

  • Agent reconciler sends periodic heartbeats, retrying once in several seconds. Normally it should succeed right away, but in this case it doesn't, probably because of other fleet agents with 0.11.7, or any strange state in etcd registry.
  • At some point it finally manages to register. But at that point event stream was gone, or any other channel was closed. Not sure how.

dongsupark pushed a commit to dongsupark/fleet that referenced this issue Jul 1, 2016
To avoid potential nil-dereferences in agent reconciliation, check
that eStream is available before accessing to eStream as well as
eStream.Next(). Otherwise, just return.

Example error logs:
====
ERROR server.go:192: Server register machine failed: 105: Key already exists
(/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203153266]
INFO server.go:188: hrt.Register() success
INFO server.go:198: Starting server components
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4bc024]
goroutine 49 [running]:
panic(0xc06080, 0xc82000a0a0)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/coreos/fleet/pkg.(*reconciler).Run.func1(0xc8201cef00, 0xc8201b9560, 0xc8201cf2c0)
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:65 +0x74
created by github.com/coreos/fleet/pkg.(*reconciler).Run
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:69 +0x89
systemd[1]: fleet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: fleet.service: Unit entered failed state.
systemd[1]: fleet.service: Failed with result 'exit-code'.
====

Fixes: coreos#1622
dongsupark pushed a commit to dongsupark/fleet that referenced this issue Jul 1, 2016
To avoid potential nil-dereferences in agent reconciliation, check
that eStream is available before accessing to eStream. Otherwise,
just return.

Example error logs:
====
ERROR server.go:192: Server register machine failed: 105: Key already exists
(/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203153266]
INFO server.go:188: hrt.Register() success
INFO server.go:198: Starting server components
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4bc024]
goroutine 49 [running]:
panic(0xc06080, 0xc82000a0a0)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/coreos/fleet/pkg.(*reconciler).Run.func1(0xc8201cef00, 0xc8201b9560, 0xc8201cf2c0)
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:65 +0x74
created by github.com/coreos/fleet/pkg.(*reconciler).Run
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:69 +0x89
systemd[1]: fleet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: fleet.service: Unit entered failed state.
systemd[1]: fleet.service: Failed with result 'exit-code'.
====

Fixes: coreos#1622
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants