Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

agent: check for existing machine-ids on startup #615

Closed
jonboulle opened this issue Jul 3, 2014 · 13 comments · Fixed by #1561
Closed

agent: check for existing machine-ids on startup #615

jonboulle opened this issue Jul 3, 2014 · 13 comments · Fixed by #1561

Comments

@jonboulle
Copy link
Contributor

If multiple systems in a cluster happen to have the same machine-id, all hell can break loose. It would be nice if the agent could be able to do a simple sanity check when first registering itself in the cluster; if there's an existing machine with its machine-id, it can log something useful and/or back off and try again (e.g. in case a previously-pushed machineState just hasn't TTLed yet)

@bcwaldon
Copy link
Contributor

bcwaldon commented Jul 3, 2014

What situations can cause this to happen?

@jonboulle
Copy link
Contributor Author

Cloned VMs (come up a few times on the mailing list).
Using KVM (until the very newest version of systemd, which we're not using yet).
Someone trying to run multiple instances of fleet on a single machine.

@bcwaldon bcwaldon added the bug label Jul 8, 2014
@ibuildthecloud
Copy link

+1

@bcwaldon
Copy link
Contributor

bcwaldon commented Aug 1, 2014

If we do this, I believe we would lose the ability for an agent to go down with SIGKILL (i.e. no state purge) and come back up without any service interruption. If we make an agent wait on its state key to time out before it can rejoin, that will cause unnecessary churn in the schedule. Once we switch to a more direct presence mechanism (direct API requests) this change won't be as disruptive.

@carmstrong
Copy link

Ah, I thought #697 fixed this, but it looks like it didn't. I must've misread the release notes.

@carmstrong
Copy link

(The whole 'accidentally running fleet instead of fleetctl' issue)

@bcwaldon
Copy link
Contributor

@carmstrong we should really just move fleet to fleetd.

@carmstrong
Copy link

we should really just move fleet to fleetd.

👍

@ibuildthecloud
Copy link

+1 renaming to fleetd

@jonboulle
Copy link
Contributor Author

#807

@steveej
Copy link
Contributor

steveej commented May 26, 2015

I strongly vote for including a sanity check to ensure unique machine-ids. I came across this due to cloned VMs and spent hours of debugging until @jonboulle pointed me in the right direction.

wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 3, 2015
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 6, 2015
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
wuqixuan pushed a commit to wuqixuan/fleet that referenced this issue Jul 11, 2015
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
@jasonkeene
Copy link

+1 this was a head scratcher

@jonboulle jonboulle added kind/bug and removed bug labels Sep 24, 2015
@jonboulle jonboulle added this to the v0.13.0 milestone Jan 25, 2016
@tixxdz
Copy link
Contributor

tixxdz commented Mar 23, 2016

@antrik could you take this one, and it seems there is a PR #1288

dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 19, 2016
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 19, 2016
A new test TestDetectMachineId checks if a etcd registration fails
when a duplicated entry for /etc/machine-id gets registered to
different machines. Note that it's expected to fail in this case.

Goal of the test is to cover the improvement patch by @wuqixuan
("fleetd: Detecting the existing machine-id").

See also coreos#1288,
coreos#1241,
coreos#615.

Suggested-by: Olaf Buddenhagen <olaf@endocode.com>
Cc: wuqixuan <wuqixuan@huawei.com>
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 20, 2016
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 20, 2016
A new test TestDetectMachineId checks if a etcd registration fails
when a duplicated entry for /etc/machine-id gets registered to
different machines. Note that it's expected to fail in this case.

Goal of the test is to cover the improvement patch by @wuqixuan
("fleetd: Detecting the existing machine-id").

See also coreos#1288,
coreos#1241,
coreos#615.

Suggested-by: Olaf Buddenhagen <olaf@endocode.com>
Cc: wuqixuan <wuqixuan@huawei.com>
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 21, 2016
A new test TestDetectMachineId checks if a etcd registration fails
when a duplicated entry for /etc/machine-id gets registered to
different machines. Note that it's expected to fail in this case.

Goal of the test is to cover the improvement patch by @wuqixuan
("fleetd: Detecting the existing machine-id").

See also coreos#1288,
coreos#1241,
coreos#615.

Suggested-by: Olaf Buddenhagen <olaf@endocode.com>
Cc: wuqixuan <wuqixuan@huawei.com>
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 21, 2016
Now support detecting the existing machine-id on startup.

Fixes coreos#1241 coreos#615
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 21, 2016
A new test TestDetectMachineId checks if a etcd registration fails
when a duplicated entry for /etc/machine-id gets registered to
different machines. Note that it's expected to fail in this case.

Goal of the test is to cover the improvement patch by @wuqixuan
("fleetd: Detecting the existing machine-id").

See also coreos#1288,
coreos#1241,
coreos#615.

Suggested-by: Olaf Buddenhagen <olaf@endocode.com>
Cc: wuqixuan <wuqixuan@huawei.com>
Cc: Djalal Harouni <djalal@endocode.com>
dongsupark pushed a commit to endocode/fleet that referenced this issue Apr 21, 2016
A new test TestDetectMachineId checks if a etcd registration fails
when a duplicated entry for /etc/machine-id gets registered to
different machines. Note that it's expected to fail in this case.

Goal of the test is to cover the improvement patch by @wuqixuan
("fleetd: Detecting the existing machine-id").

See also coreos#1288,
coreos#1241,
coreos#615.

Suggested-by: Olaf Buddenhagen <olaf@endocode.com>
Cc: wuqixuan <wuqixuan@huawei.com>
Cc: Djalal Harouni <djalal@endocode.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants