Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate names in ETCD_INITIAL_CLUSTER not handled correctly #13757

Closed
mortehu opened this issue Mar 3, 2022 · 15 comments · Fixed by #14613
Closed

Duplicate names in ETCD_INITIAL_CLUSTER not handled correctly #13757

mortehu opened this issue Mar 3, 2022 · 15 comments · Fixed by #14613
Assignees

Comments

@mortehu
Copy link

mortehu commented Mar 3, 2022

What happened?

If you don't pass a --name argument to your etcd processes, they will all have the name default and the cluster will operate normally. However, when you add a member, the generated ETCD_INITIAL_CLUSTER variable will have multiple entries with the name "default". When this environment variable is used, etcd will parse these into a mapping under a single key ("default") with multiple URLs, and create a single member. See

func NewClusterFromURLsMap(lg *zap.Logger, token string, urlsmap types.URLsMap, opts ...ClusterOption) (*RaftCluster, error) {
c := NewCluster(lg, opts...)
for name, urls := range urlsmap {
m := NewMember(name, urls, token, nil)

This leads to the confusing error message "member count is unequal". The documentation on https://etcd.io/docs/v3.5/op-guide/runtime-configuration/ mentions this failure, but the situation is different.

What did you expect to happen?

Either

a. member add should fail, saying it cannot generate a valid ETCD_INITIAL_CLUSTER due to duplicate names, or
b. etcd should accept duplicate names in ETCD_INITIAL_CLUSTER and treat them as separate members. This can be accomplished by updating func NewClusterFromURLsMap as follows:

	c := NewCluster(lg, opts...)
	for name, urls := range urlsmap {
		for idx, _ := range urls {
			m := NewMember(name, urls[idx:idx+1], token, nil)
			[...]

I don't know if there's a real need to be able to specify multiple URLs for a single member.

How can we reproduce it (as minimally and precisely as possible)?

You need three terminals, x, y, and z:

x$ mkdir -p test_case/{a,b,c}/{data/member,wal}
x$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name a --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ ETCD_INITIAL_CLUSTER="a=http://127.0.0.1:40000,b=http://127.0.0.1:40001" ETCD_INITIAL_CLUSTER_STATE=new etcd --name b --{initial-advertise,listen}-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
[now kill both servers with Ctrl-C]
x$ etcd --listen-peer-urls=http://127.0.0.1:40000 --{advertise,listen}-client-urls=http://127.0.0.1:50000 --data-dir test_case/a/data --wal-dir test_case/a/wal
y$ etcd --listen-peer-urls=http://127.0.0.1:40001 --{advertise,listen}-client-urls=http://127.0.0.1:50001 --data-dir test_case/b/data --wal-dir test_case/b/wal
z$ ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c http://127.0.0.1:40002
Added member named c with ID 7b4d6e3edb76bc59 to cluster

ETCD_NAME="c"
ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
ETCD_INITIAL_CLUSTER_STATE="existing"
z$ export ETCD_NAME="c"
z$ export ETCD_INITIAL_CLUSTER="default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"
z$ export ETCD_INITIAL_CLUSTER_STATE="existing"
z$ etcd --listen-peer-urls=http://127.0.0.1:40002 --{advertise,listen}-client-urls=http://127.0.0.1:50002 --data-dir test_case/c/data --wal-dir test_case/c/wal
[...]
member count is unequal

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

etcd 3.5.2

Relevant log output

No response

@ahrtr
Copy link
Member

ahrtr commented Mar 3, 2022

A member can have multiple client or peer URLs. So in this case, you must specify the flag --name. But I agree that we should add a warning if the flag --name isn't present. Feel free to submit a PR for this. Thanks.

@Divya063
Copy link

@ahrtr Would it be okay if I work on this?

@ahrtr
Copy link
Member

ahrtr commented Apr 11, 2022

@Divya063 Definitely yes. Thank you!

@nisarg1499
Copy link

nisarg1499 commented Apr 12, 2022

Hey @mortehu
I was trying to reproduce the issue by your given commands.
First of all, I think that this command on z terminal is wrong -> z$ ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c http://127.0.0.1:40002. As it gave me error: Error: too many arguments, did you mean --peer-urls=http://127.0.0.1:40002

After that I ran this command: ETCDCTL_ENDPOINT=http://localhost:50000 etcdctl member add c --peer-urls=http://127.0.0.1:40002 and the output was as follows.

{"level":"warn","ts":"2022-04-12T00:20:06.341-0700","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_ENDPOINT=http://localhost:50000"}
Member f6f1fd0cdb6d6ac0 added to cluster cdf818194e3a8c32

ETCD_NAME="c"
ETCD_INITIAL_CLUSTER="default=http://localhost:2380,c=http://127.0.0.1:40002"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:40002"
ETCD_INITIAL_CLUSTER_STATE="existing"

After adding the member, I exported the required variables and executed etcd listen command: etcd --listen-peer-urls=http://127.0.0.1:40002 --{advertise,listen}-client-urls=http://127.0.0.1:50002 --data-dir test_case/c/data --wal-dir test_case/c/wal, but I didn't got any error of member count is unequal.

Instead, the error was:

{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"default=http://127.0.0.1:40000,c=http://127.0.0.1:40002,default=http://127.0.0.1:40001"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"c"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--listen-peer-urls=http://127.0.0.1:40002","--advertise-client-urls=http://127.0.0.1:50002","--listen-client-urls=http://127.0.0.1:50002","--data-dir","test_case/c/data","--wal-dir","test_case/c/wal"]}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"etcdmain/etcd.go:116","msg":"server has already been initialized","data-dir":"test_case/c/data","dir-type":"member"}
{"level":"info","ts":"2022-04-12T00:21:32.887-0700","caller":"embed/etcd.go:121","msg":"configuring peer listeners","listen-peer-urls":["http://127.0.0.1:40002"]}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"embed/etcd.go:129","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:50002"]}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"7d3ca1f51","go-version":"go1.18","go-os":"linux","go-arch":"amd64","max-cpu-set":12,"max-cpu-available":12,"member-initialized":false,"name":"c","data-dir":"test_case/c/data","wal-dir":"test_case/c/wal","wal-dir-dedicated":"test_case/c/wal","member-dir":"test_case/c/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","wait-cluster-ready-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://127.0.0.1:40002"],"advertise-client-urls":["http://127.0.0.1:50002"],"listen-client-urls":["http://127.0.0.1:50002"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"c=http://127.0.0.1:40002,default=http://127.0.0.1:40000,default=http://127.0.0.1:40001","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"test_case/c/data\" exist, but the permission is \"drwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"test_case/c/data/member\" exist, but the permission is \"drwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"info","ts":"2022-04-12T00:21:32.888-0700","caller":"storage/backend.go:81","msg":"opened backend db","path":"test_case/c/data/member/snap/db","took":"82.44µs"}
{"level":"warn","ts":"2022-04-12T00:21:32.888-0700","caller":"schema/schema.go:43","msg":"Failed to detect storage schema version. Please wait till wal snapshot before upgrading cluster."}
{"level":"info","ts":"2022-04-12T00:21:33.006-0700","caller":"embed/etcd.go:383","msg":"closing etcd server","name":"c","data-dir":"test_case/c/data","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://127.0.0.1:50002"]}
{"level":"info","ts":"2022-04-12T00:21:33.006-0700","caller":"embed/etcd.go:385","msg":"closed etcd server","name":"c","data-dir":"test_case/c/data","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://127.0.0.1:50002"]}
{"level":"fatal","ts":"2022-04-12T00:21:33.006-0700","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:aab0e09a079f9f55 Members:[&{ID:33cf8d3d56df1746 RaftAttributes:{PeerURLs:[http://127.0.0.1:40000] IsLearner:false} Attributes:{Name:default ClientURLs:[http://127.0.0.1:50000]}} &{ID:8d0cef3f13600fd7 RaftAttributes:{PeerURLs:[http://127.0.0.1:40001] IsLearner:false} Attributes:{Name:default ClientURLs:[http://127.0.0.1:50001]}}] RemovedMemberIDs:[]}: PeerURLs: no match found for existing member (33cf8d3d56df1746, [http://127.0.0.1:40000]), last resolver error (len([\"http://127.0.0.1:40000\"]) != len([\"http://127.0.0.1:40000\" \"http://127.0.0.1:40001\"]))","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/home/nisarg1499/opensource/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/home/nisarg1499/opensource/etcd/server/etcdmain/main.go:40\nmain.main\n\t/home/nisarg1499/opensource/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.

@keremgocen
Copy link

hey, I'm looking for a beginner-friendly issue if this one is available

@ahrtr
Copy link
Member

ahrtr commented Jun 14, 2022

Thanks @keremgocen , let's double confirm with @Divya063 firstly to avoid doing duplicate work.

@Divya063 are you still working on this?

@nisarg1499
Copy link

@keremgocen Do let me know if you are bale to replicate the issue? I am also looking to work on some beginner-friendly issues. @ahrtr

@ahrtr
Copy link
Member

ahrtr commented Jun 14, 2022

Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.

Two comments:

  1. The environment variable should be ETCDCTL_ENDPOINTS instead of ETCDCTL_ENDPOINT;
  2. You need to start a cluster with multiple members, i.e. 3

@nisarg1499
Copy link

Can you please tell me where I went wrong in reproducing the error? I followed the same given commands for terminal x and y.

Two comments:

  1. The environment variable should be ETCDCTL_ENDPOINTS instead of ETCDCTL_ENDPOINT;
  2. You need to start a cluster with multiple members, i.e. 3

Thanks a lot for your reply. I'll check it.

@nic-chen
Copy link
Contributor

Looks like no progress on this issue.

I would like to work on it.

@nic-chen
Copy link
Contributor

I read the relevant code and found that Config.Name only has an actual role when the member is started for the first time -- used to determine whether it is local or remote: https://github.com/etcd-io/etcd/blob/main/server/etcdserver/cluster_util.go#L129

At other times, it is just an identifier without any constraints. Even the same member can be started with a different name each time

So I am more inclined to accept duplicate names in ETCD_INITIAL_CLUSTER and treat them as separate members.

What's your opinion? Thanks! @serathius @ahrtr

@UtR491
Copy link

UtR491 commented Oct 17, 2022

@nic-chen are you working on this?
@ahrtr I was able to reproduce the issue. If @nic-chen is not working on this can I take it up? Also which of the two approaches would you suggest for solving the issue?

@ahrtr
Copy link
Member

ahrtr commented Oct 17, 2022

Just as I mentioned previously #13757 (comment), each member can have multiple peer URLs. In the following example, http://1.1.1.1:2380 and http://2.2.2.2::2380 are regarded as two peer URLs of the member mach0. I don't think we should change this existing behavior.

mach0=http://1.1.1.1:2380,mach0=http://2.2.2.2::2380,mach1=http://3.3.3.3:2380,mach2=http://4.4.4.4:2380

I think we just need to print a warning message if users do not provide a value for --name.

@nic-chen
Copy link
Contributor

Just as I mentioned previously #13757 (comment), each member can have multiple peer URLs. In the following example, http://1.1.1.1:2380 and http://2.2.2.2::2380 are regarded as two peer URLs of the member mach0. I don't think we should change this existing behavior.

mach0=http://1.1.1.1:2380,mach0=http://2.2.2.2::2380,mach1=http://3.3.3.3:2380,mach2=http://4.4.4.4:2380

I think we just need to print a warning message if users do not provide a value for --name.

Thanks for the explanation! I missed that comment...

@nic-chen
Copy link
Contributor

@nic-chen are you working on this? @ahrtr I was able to reproduce the issue. If @nic-chen is not working on this can I take it up? Also which of the two approaches would you suggest for solving the issue?

hi @UtR491

sure, I reproduced and fixed it locally, just not finished testing, and I wanted to wait for a reply because I'm not that familiar with etcd.

A PR will be submitted this week.

If you could fix it and add test cases quickly, PR is welcome, I wouldn't mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

8 participants
@keremgocen @mortehu @Divya063 @ahrtr @nisarg1499 @nic-chen @UtR491 and others