Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode #3543

Merged
merged 1 commit into from
Sep 18, 2015

Conversation

mitake
Copy link
Contributor

@mitake mitake commented Sep 16, 2015

Like the commit 6974fc6, this commit lets etcdserver forbid
removing started member if quorum cannot be preserved after
reconfiguration if the option -strict-reconfig-check is passed to
etcd. The removal can cause deadlock if unstarted members have wrong
peer URLs.

@mitake
Copy link
Contributor Author

mitake commented Sep 16, 2015

Tests seem to be failing, I'll fix it ASAP.

@mitake
Copy link
Contributor Author

mitake commented Sep 16, 2015

Fixed tests, they are incorrect because of my misunderstanding.

@@ -451,6 +451,7 @@ func TestApplyConfChangeError(t *testing.T) {
srv := &EtcdServer{
r: raftNode{Node: n},
cluster: cl,
cfg: &ServerConfig{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move it to the first line of EtcdServer struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the cfg be placed at first line? Seems that after r (raftNode) would be suitable for the member. How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to follow the field order in the struct: https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L144
So the best place should be the second line actually. (I remembered the wrong order when i left the comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it.

@yichengq
Copy link
Contributor

LGTM. defer to @xiang90

{
// 1/2 members ready, should fail
[]*Member{
newTestMember(1, nil, "1", nil),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the second member? the comment says 1/2 members are ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is clearly my mistake... I'll fix it soon.

@xiang90
Copy link
Contributor

xiang90 commented Sep 17, 2015

LGTM after fixing the comment.

…ved in strict reconfig mode

Like the commit 6974fc6, this commit lets etcdserver forbid
removing started member if quorum cannot be preserved after
reconfiguration if the option -strict-reconfig-check is passed to
etcd. The removal can cause deadlock if unstarted members have wrong
peer URLs.
@mitake
Copy link
Contributor Author

mitake commented Sep 18, 2015

Updated based on the comments.

@yichengq
Copy link
Contributor

LGTM

yichengq added a commit that referenced this pull request Sep 18, 2015
etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode
@yichengq yichengq merged commit cedad49 into etcd-io:master Sep 18, 2015
@mitake
Copy link
Contributor Author

mitake commented Sep 18, 2015

Thanks for your review and merge, @xiang90 @yichengq . BTW, ideally this checking logic should be moved to raft side as we discussed before. When is the next major version up planned (which will not provide rolling update)?

@yichengq
Copy link
Contributor

@mitake My thought is to keep the code as it today and not move it to raftlog apply side. Several reasons:

  1. current solution is good enough to avoid normal related misoperations. It is almost as strict as(though not that strict) checking it at apply stage.
  2. rolling update is important

@xiang90
Copy link
Contributor

xiang90 commented Sep 18, 2015

@mitake I think the next step is to treat an inactive member the same as a unstarted one. It makes the checking better.

@mitake
Copy link
Contributor Author

mitake commented Sep 24, 2015

@yichengq @xiang90 thanks for your comments.

@xiang90 seems very interesting direction. I'll work on it ASAP.

mitake added a commit to mitake/etcd that referenced this pull request Nov 6, 2015
…nfig

-strict-reconfig-check option ensures a quorum cannot be losed by
membership reconfiguration even if unstarted nodes are misconfigured
(e.g. typos in peerURLs). As suggested in [1], this commit lets
isReadyToAddMember() not to count unreachable nodes as started nodes.

[1] etcd-io#3543 (comment)
mitake added a commit to mitake/etcd that referenced this pull request Nov 25, 2015
…nfig

-strict-reconfig-check option ensures a quorum cannot be losed by
membership reconfiguration even if unstarted nodes are misconfigured
(e.g. typos in peerURLs). As suggested in [1], this commit lets
isReadyToAddMember() not to count unreachable nodes as started nodes.

[1] etcd-io#3543 (comment)
mitake added a commit to mitake/etcd that referenced this pull request Nov 25, 2015
…nfig

-strict-reconfig-check option ensures a quorum cannot be losed by
membership reconfiguration even if unstarted nodes are misconfigured
(e.g. typos in peerURLs). As suggested in [1], this commit lets
isReadyToAddMember() not to count unreachable nodes as started nodes.

[1] etcd-io#3543 (comment)
@mitake mitake deleted the reconfig-remove branch December 30, 2015 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants