agent: check for existing machine-ids on startup #615

jonboulle · 2014-07-03T17:27:19Z

If multiple systems in a cluster happen to have the same machine-id, all hell can break loose. It would be nice if the agent could be able to do a simple sanity check when first registering itself in the cluster; if there's an existing machine with its machine-id, it can log something useful and/or back off and try again (e.g. in case a previously-pushed machineState just hasn't TTLed yet)

bcwaldon · 2014-07-03T17:30:05Z

What situations can cause this to happen?

jonboulle · 2014-07-03T17:32:05Z

Cloned VMs (come up a few times on the mailing list).
Using KVM (until the very newest version of systemd, which we're not using yet).
Someone trying to run multiple instances of fleet on a single machine.

ibuildthecloud · 2014-07-11T22:55:10Z

+1

bcwaldon · 2014-08-01T18:31:50Z

If we do this, I believe we would lose the ability for an agent to go down with SIGKILL (i.e. no state purge) and come back up without any service interruption. If we make an agent wait on its state key to time out before it can rejoin, that will cause unnecessary churn in the schedule. Once we switch to a more direct presence mechanism (direct API requests) this change won't be as disruptive.

carmstrong · 2014-08-15T21:03:18Z

Ah, I thought #697 fixed this, but it looks like it didn't. I must've misread the release notes.

carmstrong · 2014-08-15T21:03:33Z

(The whole 'accidentally running fleet instead of fleetctl' issue)

bcwaldon · 2014-08-21T23:06:03Z

@carmstrong we should really just move fleet to fleetd.

carmstrong · 2014-08-21T23:08:29Z

we should really just move fleet to fleetd.

👍

ibuildthecloud · 2014-08-22T00:07:54Z

+1 renaming to fleetd

jonboulle · 2014-08-22T22:56:09Z

#807

steveej · 2015-05-26T06:11:34Z

I strongly vote for including a sanity check to ensure unique machine-ids. I came across this due to cloned VMs and spent hours of debugging until @jonboulle pointed me in the right direction.

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

jasonkeene · 2015-07-22T02:24:12Z

+1 this was a head scratcher

tixxdz · 2016-03-23T07:20:05Z

@antrik could you take this one, and it seems there is a PR #1288

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

@wuqixuan

A new test TestDetectMachineId checks if a etcd registration fails when a duplicated entry for /etc/machine-id gets registered to different machines. Note that it's expected to fail in this case. Goal of the test is to cover the improvement patch by @wuqixuan ("fleetd: Detecting the existing machine-id"). See also coreos#1288, coreos#1241, coreos#615. Suggested-by: Olaf Buddenhagen <olaf@endocode.com> Cc: wuqixuan <wuqixuan@huawei.com>

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

@wuqixuan

A new test TestDetectMachineId checks if a etcd registration fails when a duplicated entry for /etc/machine-id gets registered to different machines. Note that it's expected to fail in this case. Goal of the test is to cover the improvement patch by @wuqixuan ("fleetd: Detecting the existing machine-id"). See also coreos#1288, coreos#1241, coreos#615. Suggested-by: Olaf Buddenhagen <olaf@endocode.com> Cc: wuqixuan <wuqixuan@huawei.com>

@wuqixuan

A new test TestDetectMachineId checks if a etcd registration fails when a duplicated entry for /etc/machine-id gets registered to different machines. Note that it's expected to fail in this case. Goal of the test is to cover the improvement patch by @wuqixuan ("fleetd: Detecting the existing machine-id"). See also coreos#1288, coreos#1241, coreos#615. Suggested-by: Olaf Buddenhagen <olaf@endocode.com> Cc: wuqixuan <wuqixuan@huawei.com>

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

@wuqixuan

A new test TestDetectMachineId checks if a etcd registration fails when a duplicated entry for /etc/machine-id gets registered to different machines. Note that it's expected to fail in this case. Goal of the test is to cover the improvement patch by @wuqixuan ("fleetd: Detecting the existing machine-id"). See also coreos#1288, coreos#1241, coreos#615. Suggested-by: Olaf Buddenhagen <olaf@endocode.com> Cc: wuqixuan <wuqixuan@huawei.com> Cc: Djalal Harouni <djalal@endocode.com>

@wuqixuan

A new test TestDetectMachineId checks if a etcd registration fails when a duplicated entry for /etc/machine-id gets registered to different machines. Note that it's expected to fail in this case. Goal of the test is to cover the improvement patch by @wuqixuan ("fleetd: Detecting the existing machine-id"). See also coreos#1288, coreos#1241, coreos#615. Suggested-by: Olaf Buddenhagen <olaf@endocode.com> Cc: wuqixuan <wuqixuan@huawei.com> Cc: Djalal Harouni <djalal@endocode.com>

bcwaldon added the bug label Jul 8, 2014

jonboulle mentioned this issue Aug 6, 2014

Monitor heartbeat attempt can time out, but succeed #750

Closed

jonboulle mentioned this issue Jun 5, 2015

fleet should detect machines with the same machine-id #1241

Closed

wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 3, 2015

fleetd: Detecting the existing machine-id

8a2b52a

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

wuqixuan mentioned this issue Jul 3, 2015

fleetd: Detecting the existing machine-id #1288

Closed

wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 6, 2015

fleetd: Detecting the existing machine-id

491b932

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

This was referenced Jul 9, 2015

fleetd: prevent to run more than one fleetd deamon #1229

Closed

Fleetd should prevent to run more than one daemon #1220

Closed

wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 11, 2015

fleetd: Detecting the existing machine-id

495f137

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

jonboulle added kind/bug and removed bug labels Sep 24, 2015

jonboulle added this to the v0.13.0 milestone Jan 25, 2016

tixxdz mentioned this issue Apr 6, 2016

fleetd fails on nodes with ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([167 != 168]) [168] #1533

Closed

tixxdz added the priority/P1 label Apr 16, 2016

dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 19, 2016

fleetd: Detecting the existing machine-id

d1a5c06

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

dongsupark mentioned this issue Apr 19, 2016

fleetd: detect the existing machine ID #1561

Merged

dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 20, 2016

fleetd: Detecting the existing machine-id

d7d8925

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 21, 2016

fleetd: Detecting the existing machine-id

32c1760

Now support detecting the existing machine-id on startup. Fixes coreos#1241 coreos#615

tixxdz closed this as completed in #1561 Apr 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent: check for existing machine-ids on startup #615

agent: check for existing machine-ids on startup #615

jonboulle commented Jul 3, 2014

bcwaldon commented Jul 3, 2014

jonboulle commented Jul 3, 2014

ibuildthecloud commented Jul 11, 2014

bcwaldon commented Aug 1, 2014

carmstrong commented Aug 15, 2014

carmstrong commented Aug 15, 2014

bcwaldon commented Aug 21, 2014

carmstrong commented Aug 21, 2014

ibuildthecloud commented Aug 22, 2014

jonboulle commented Aug 22, 2014

steveej commented May 26, 2015

jasonkeene commented Jul 22, 2015

tixxdz commented Mar 23, 2016

agent: check for existing machine-ids on startup #615

agent: check for existing machine-ids on startup #615

Comments

jonboulle commented Jul 3, 2014

bcwaldon commented Jul 3, 2014

jonboulle commented Jul 3, 2014

ibuildthecloud commented Jul 11, 2014

bcwaldon commented Aug 1, 2014

carmstrong commented Aug 15, 2014

carmstrong commented Aug 15, 2014

bcwaldon commented Aug 21, 2014

carmstrong commented Aug 21, 2014

ibuildthecloud commented Aug 22, 2014

jonboulle commented Aug 22, 2014

steveej commented May 26, 2015

jasonkeene commented Jul 22, 2015

tixxdz commented Mar 23, 2016