Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd shouldn't permit duplicate node names in ETCD_INITIAL_CLUSTER #7927

Closed
alexzorin opened this issue May 15, 2017 · 6 comments
Closed

etcd shouldn't permit duplicate node names in ETCD_INITIAL_CLUSTER #7927

alexzorin opened this issue May 15, 2017 · 6 comments
Milestone

Comments

@alexzorin
Copy link

In etcd 3.0 and 3.1,

If a user provides ETCD_INITIAL_CLUSTER that has duplicate names, e.g.

ETCD_INITIAL_CLUSTER="etcd=https://2.3.4.5:2380,etcd=https://1.2.3.4:2380,etcd=https://3.4.5.6:2380"

then etcd will happily accept this configuration, but will silently turn transform InitialPeerURLsMap into a map with a single entry of whatever the final item is, i.e. map[string]string{"etcd":"https://3.4.5.6:2380"}:

This leads to some very confusing error messages down the line when trying to join a cluster, because InitialPeerURLsMap ends up being something very different to what the user was expecting.

Probably etcd should reject such a configuration immediately, and perhaps in other locations, uniqueness should be enforced - as I had a perfectly functional cluster with duplicate names, until I had to replace a node.

@heyitsanthony
Copy link
Contributor

Maybe etcd should treat this as multiple peer URLs for the one member? It definitely shouldn't drop input on the ground like that, at least. /cc @xiang90

@heyitsanthony heyitsanthony added this to the v3.3.0 milestone May 23, 2017
@heyitsanthony
Copy link
Contributor

@alexzorin I'm not seeing this behavior.

I tried starting a new cluster:

./bin/etcd -name etcd -initial-cluster "etcd=http://127.0.0.1:2380,etcd=http://10.7.29.60:2380" --initial-advertise-peer-urls "http://127.0.0.1:2380,http://10.7.29.60:2380"

I see both peer addresses in InitialPeerURLsMap with some printf debugging:

2017-06-09 16:35:07.125761 I | etcdmain: etcd Version: 3.2.0-rc.1+git
2017-06-09 16:35:07.125950 I | etcdmain: Git SHA: 933aa09
2017-06-09 16:35:07.125957 I | etcdmain: Go Version: go1.8
2017-06-09 16:35:07.125962 I | etcdmain: Go OS/Arch: darwin/amd64
2017-06-09 16:35:07.125969 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-06-09 16:35:07.125977 N | etcdmain: failed to detect default host (default host not supported on darwin_amd64)
2017-06-09 16:35:07.125986 W | etcdmain: no data-dir provided, using default data-dir ./etcd.etcd
2017-06-09 16:35:07.126722 I | embed: listening for peers on http://localhost:2380
2017-06-09 16:35:07.126955 I | embed: listening for client requests on localhost:2379
INITIAL PEERURLSMAP in NewServer: etcd=http://10.7.29.60:2380,etcd=http://127.0.0.1:2380
2017-06-09 16:35:07.128339 I | etcdserver: name = etcd
2017-06-09 16:35:07.128353 I | etcdserver: data dir = etcd.etcd
2017-06-09 16:35:07.128358 I | etcdserver: member dir = etcd.etcd/member
2017-06-09 16:35:07.128361 I | etcdserver: heartbeat = 100ms
2017-06-09 16:35:07.128364 I | etcdserver: election = 1000ms
2017-06-09 16:35:07.128368 I | etcdserver: snapshot count = 100000
2017-06-09 16:35:07.128377 I | etcdserver: advertise client URLs = http://localhost:2379
2017-06-09 16:35:07.128384 I | etcdserver: initial advertise peer URLs = http://10.7.29.60:2380,http://127.0.0.1:2380
2017-06-09 16:35:07.128390 I | etcdserver: initial cluster = etcd=http://10.7.29.60:2380,etcd=http://127.0.0.1:2380
2017-06-09 16:35:07.211158 I | etcdserver: starting member 22730d60c7d1e6bc in cluster fe7127deeb881c7a
2017-06-09 16:35:07.211221 I | raft: 22730d60c7d1e6bc became follower at term 0
2017-06-09 16:35:07.211239 I | raft: newRaft 22730d60c7d1e6bc [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-06-09 16:35:07.211246 I | raft: 22730d60c7d1e6bc became follower at term 1
CLUSTER: {ClusterID:fe7127deeb881c7a Members:[&{ID:22730d60c7d1e6bc RaftAttributes:{PeerURLs:[http://10.7.29.60:2380 http://127.0.0.1:2380]} Attributes:{Name:etcd ClientURLs:[]}}] RemovedMemberIDs:[]}
2017-06-09 16:35:07.213054 W | auth: simple token is not cryptographically signed
2017-06-09 16:35:07.213562 I | etcdserver: starting server... [version: 3.2.0-rc.1+git, cluster version: to_be_decided]
2017-06-09 16:35:07.214576 E | etcdserver: cannot monitor file descriptor usage (cannot get FDUsage on darwin)
2017-06-09 16:35:07.215555 I | etcdserver/membership: added member 22730d60c7d1e6bc [http://10.7.29.60:2380 http://127.0.0.1:2380] to cluster fe7127deeb881c7a
2017-06-09 16:35:08.113146 I | raft: 22730d60c7d1e6bc is starting a new election at term 1
2017-06-09 16:35:08.113314 I | raft: 22730d60c7d1e6bc became candidate at term 2
2017-06-09 16:35:08.113362 I | raft: 22730d60c7d1e6bc received MsgVoteResp from 22730d60c7d1e6bc at term 2
2017-06-09 16:35:08.113392 I | raft: 22730d60c7d1e6bc became leader at term 2
2017-06-09 16:35:08.113404 I | raft: raft.node: 22730d60c7d1e6bc elected leader 22730d60c7d1e6bc at term 2
2017-06-09 16:35:08.113644 I | etcdserver: setting up the initial cluster version to 3.2
2017-06-09 16:35:08.128898 N | etcdserver/membership: set the initial cluster version to 3.2
2017-06-09 16:35:08.128930 I | etcdserver: published {Name:etcd ClientURLs:[http://localhost:2379]} to cluster fe7127deeb881c7a
2017-06-09 16:35:08.129027 I | embed: ready to serve client requests
2017-06-09 16:35:08.129279 I | etcdserver/api: enabled capabilities for version 3.2
2017-06-09 16:35:08.129505 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!

If I restart etcd with the same arguments I see the initial peers map is cleared out (but the information is already stored in the data directory)

2017-06-09 16:38:39.606789 I | etcdmain: etcd Version: 3.2.0-rc.1+git
2017-06-09 16:38:39.606877 I | etcdmain: Git SHA: 933aa09
2017-06-09 16:38:39.606880 I | etcdmain: Go Version: go1.8
2017-06-09 16:38:39.606882 I | etcdmain: Go OS/Arch: darwin/amd64
2017-06-09 16:38:39.606885 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-06-09 16:38:39.606890 N | etcdmain: failed to detect default host (default host not supported on darwin_amd64)
2017-06-09 16:38:39.606895 W | etcdmain: no data-dir provided, using default data-dir ./etcd.etcd
2017-06-09 16:38:39.607206 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-06-09 16:38:39.607544 I | embed: listening for peers on http://localhost:2380
2017-06-09 16:38:39.607670 I | embed: listening for client requests on localhost:2379
INITIAL PEERURLSMAP in NewServer:
2017-06-09 16:38:39.609101 I | etcdserver: name = etcd
2017-06-09 16:38:39.609111 I | etcdserver: data dir = etcd.etcd
2017-06-09 16:38:39.609114 I | etcdserver: member dir = etcd.etcd/member
2017-06-09 16:38:39.609117 I | etcdserver: heartbeat = 100ms
2017-06-09 16:38:39.609119 I | etcdserver: election = 1000ms
2017-06-09 16:38:39.609122 I | etcdserver: snapshot count = 100000
2017-06-09 16:38:39.609128 I | etcdserver: advertise client URLs = http://localhost:2379
2017-06-09 16:38:39.613432 I | etcdserver: restarting member 22730d60c7d1e6bc in cluster fe7127deeb881c7a at commit index 6
2017-06-09 16:38:39.613476 I | raft: 22730d60c7d1e6bc became follower at term 3
2017-06-09 16:38:39.613487 I | raft: newRaft 22730d60c7d1e6bc [peers: [], term: 3, commit: 6, applied: 0, lastindex: 6, lastterm: 3]
CLUSTER: {ClusterID:fe7127deeb881c7a Members:[] RemovedMemberIDs:[]}
2017-06-09 16:38:39.615477 W | auth: simple token is not cryptographically signed
2017-06-09 16:38:39.616031 I | etcdserver: starting server... [version: 3.2.0-rc.1+git, cluster version: to_be_decided]
2017-06-09 16:38:39.616124 E | etcdserver: cannot monitor file descriptor usage (cannot get FDUsage on darwin)
2017-06-09 16:38:39.617563 I | etcdserver/membership: added member 22730d60c7d1e6bc [http://10.7.29.60:2380 http://127.0.0.1:2380] to cluster fe7127deeb881c7a
2017-06-09 16:38:39.617663 N | etcdserver/membership: set the initial cluster version to 3.2
2017-06-09 16:38:39.617705 I | etcdserver/api: enabled capabilities for version 3.2
2017-06-09 16:38:40.014321 I | raft: 22730d60c7d1e6bc is starting a new election at term 3
2017-06-09 16:38:40.014498 I | raft: 22730d60c7d1e6bc became candidate at term 4
2017-06-09 16:38:40.015384 I | raft: 22730d60c7d1e6bc received MsgVoteResp from 22730d60c7d1e6bc at term 4
2017-06-09 16:38:40.015422 I | raft: 22730d60c7d1e6bc became leader at term 4
2017-06-09 16:38:40.015438 I | raft: raft.node: 22730d60c7d1e6bc elected leader 22730d60c7d1e6bc at term 4
2017-06-09 16:38:40.015690 I | embed: ready to serve client requests
2017-06-09 16:38:40.015764 I | etcdserver: published {Name:etcd ClientURLs:[http://localhost:2379]} to cluster fe7127deeb881c7a
2017-06-09 16:38:40.016197 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!

So I can't get it to drop nodes from the map while keeping the last one. What am I missing?

@alexzorin
Copy link
Author

alexzorin commented Jun 12, 2017

I can't remember the exact reproduction now as its been a while, but I believe the name flattening is still a problem both in v3.1.3 (where I had the actual production issue) and on master.

I think this is not quite the same error I had but is symptomatic of the same issue (a notable difference to your repro attempt is that I only provide one peer for ETCD_INITIAL_ADVERTISE_PEER_URLS).

I think that the final error doesn't make sense, and I think this is because of the duplicate node names inadvertently missing peers on the list.

Either renaming the duplicate node names (which was my production fix), or copying the peer list fully to the advertise-peer-urls variable (which you did in your repro) steps around the issue.

$ ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:2380" ETCD_INITIAL_CLUSTER="etcd=http://127.0.0.1:2380,etcd=http://10.7.29.60:2380" ./bin/etcd -name etcd
2017-06-12 11:03:44.719144 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://127.0.0.1:2380
2017-06-12 11:03:44.719213 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=etcd=http://127.0.0.1:2380,etcd=http://10.7.29.60:2380
2017-06-12 11:03:44.719350 I | etcdmain: etcd Version: 3.2.0-rc.1+git
2017-06-12 11:03:44.719362 I | etcdmain: Git SHA: 933aa09
2017-06-12 11:03:44.719371 I | etcdmain: Go Version: go1.8
2017-06-12 11:03:44.719380 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-12 11:03:44.719389 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-06-12 11:03:44.719405 W | etcdmain: no data-dir provided, using default data-dir ./etcd.etcd
2017-06-12 11:03:44.719751 I | embed: listening for peers on http://localhost:2380
2017-06-12 11:03:44.719896 I | embed: listening for client requests on localhost:2379
2017-06-12 11:03:44.725317 I | etcdmain: --initial-cluster must include etcd=http://127.0.0.1:2380 given --initial-advertise-peer-urls=http://127.0.0.1:2380

@heyitsanthony
Copy link
Contributor

I think that the final error doesn't make sense, and I think this is because of the duplicate node names inadvertently missing peers on the list.

What error? The --intial-cluster error is correct-- initial peers are given that don't match the inital cluster; it's misconfigured since the initial cluster has more peers for the node than declared for its initial advertise peer addresses. I don't think etcd should try to correct broken user input.

Is there actually a bug here or can this be closed?

@heyitsanthony
Copy link
Contributor

OK, I see the bug with --initial-cluster; it should be giving must include etcd=http://10.7.29.60:2380 instead of 127.0.0.1:2380 which is already provided. Will fix.

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 12, 2017
…se urls

The old error was not clear about what URLs needed to be added, sometimes
truncating the list. To make it clearer, print out the missing entries
for --initial-cluster and print the full list of initial advertise peers.

Fixes etcd-io#8079 and etcd-io#7927
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 12, 2017
…se urls

The old error was not clear about what URLs needed to be added, sometimes
truncating the list. To make it clearer, print out the missing entries
for --initial-cluster and print the full list of initial advertise peers.

Fixes etcd-io#8079 and etcd-io#7927
@heyitsanthony
Copy link
Contributor

Fixed by #8083, closing.

yudai pushed a commit to yudai/etcd that referenced this issue Oct 5, 2017
…se urls

The old error was not clear about what URLs needed to be added, sometimes
truncating the list. To make it clearer, print out the missing entries
for --initial-cluster and print the full list of initial advertise peers.

Fixes etcd-io#8079 and etcd-io#7927
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants