Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quorum calculation when promoting a learner member #11640

Merged
merged 1 commit into from
Feb 21, 2020
Merged

Fix quorum calculation when promoting a learner member #11640

merged 1 commit into from
Feb 21, 2020

Conversation

ereslibre
Copy link
Contributor

When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.

Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.

Fixes: #11633


Note: it would be good to add tests to

etcd/etcdserver/server.go

Lines 1490 to 1520 in b64b36a

func (s *EtcdServer) promoteMember(ctx context.Context, id uint64) ([]*membership.Member, error) {
if err := s.checkMembershipOperationPermission(ctx); err != nil {
return nil, err
}
// check if we can promote this learner.
if err := s.mayPromoteMember(types.ID(id)); err != nil {
return nil, err
}
// build the context for the promote confChange. mark IsLearner to false and IsPromote to true.
promoteChangeContext := membership.ConfigChangeContext{
Member: membership.Member{
ID: types.ID(id),
},
IsPromote: true,
}
b, err := json.Marshal(promoteChangeContext)
if err != nil {
return nil, err
}
cc := raftpb.ConfChange{
Type: raftpb.ConfChangeAddNode,
NodeID: id,
Context: b,
}
return s.configure(ctx, cc)
}
and

etcd/etcdserver/server.go

Lines 1522 to 1543 in b64b36a

func (s *EtcdServer) mayPromoteMember(id types.ID) error {
lg := s.getLogger()
err := s.isLearnerReady(uint64(id))
if err != nil {
return err
}
if !s.Cfg.StrictReconfigCheck {
return nil
}
if !s.cluster.IsReadyToPromoteMember(uint64(id)) {
lg.Warn(
"rejecting member promote request; not enough healthy members",
zap.String("local-member-id", s.ID().String()),
zap.String("requested-member-remove-id", id.String()),
zap.Error(ErrNotEnoughStartedMembers),
)
return ErrNotEnoughStartedMembers
}
return nil
}
as well, so we test other promotion traits. As far as I could see there are no tests at all regarding learning member promotion (or I couldn't find them).

@codecov-io
Copy link

Codecov Report

Merging #11640 into master will decrease coverage by 0.27%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11640      +/-   ##
==========================================
- Coverage   66.28%      66%   -0.28%     
==========================================
  Files         401      401              
  Lines       36629    36629              
==========================================
- Hits        24279    24177     -102     
- Misses      10861    10940      +79     
- Partials     1489     1512      +23
Impacted Files Coverage Δ
etcdserver/api/membership/cluster.go 84.79% <100%> (+3.75%) ⬆️
clientv3/ordering/kv.go 8.82% <0%> (-76.48%) ⬇️
client/client.go 42.81% <0%> (-27.78%) ⬇️
auth/simple_token.go 68.06% <0%> (-21.01%) ⬇️
auth/range_perm_cache.go 51.42% <0%> (-17.15%) ⬇️
proxy/grpcproxy/watcher.go 85.71% <0%> (-8.17%) ⬇️
clientv3/leasing/util.go 91.66% <0%> (-6.67%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-6.07%) ⬇️
clientv3/concurrency/mutex.go 62.16% <0%> (-5.41%) ⬇️
raft/tracker/inflights.go 91.83% <0%> (-4.09%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6a3c99...63dc7f2. Read the comment docs.

@ereslibre
Copy link
Contributor Author

ereslibre commented Feb 19, 2020

I went back and forth because on one side I wanted to behave somewhat similar to IsReadyToAddVotingMember, but this method obviously assumes that a new member is unstarted (something that is not happening when promoting a member -- when we reach this point we not only know that the learner member is started, but also that is synced with the leader), so I think strictly speaking having a specific case for 1-size clusters is not the best solution for the promote case.

That's why I decided to instead just count the voting members (instead of voting members + 1 as the IsReadyToAddVotingMember).

@jingyih
Copy link
Contributor

jingyih commented Feb 20, 2020

Regarding tests for promoting a member, we have some integration test cases here [1]. It will be great if you can add any missing test cases.

[1] https://github.com/etcd-io/etcd/blob/master/clientv3/integration/cluster_test.go

When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.

Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.

Fixes: #11633
Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WIZARD-CXY
Copy link
Contributor

LGTM

@@ -857,3 +857,90 @@ func TestIsReadyToRemoveVotingMember(t *testing.T) {
}
}
}

func TestIsReadyToPromoteMember(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgmt nice job @ereslibre thanks!

@spzala spzala merged commit 8c1a820 into etcd-io:master Feb 21, 2020
@ereslibre ereslibre deleted the fix-learner-promotion branch February 21, 2020 17:37
jingyih added a commit that referenced this pull request Feb 23, 2020
…0-upstream-release-3.4

Automated cherry pick of #11640 on release-3.4
jingyih added a commit that referenced this pull request Feb 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Question: promote learning member on a 1 size cluster
5 participants