-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: set n.Node and n.isMember to correct value in stop/removeNode #1288
Conversation
Current coverage is 55.18% (diff: 67.74%)@@ master #1288 diff @@
==========================================
Files 80 80
Lines 12562 12565 +3
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6933 6934 +1
- Misses 4674 4681 +7
+ Partials 955 950 -5
|
@runshenzhu can't we just set isMember to 0? Also with mutexes atomic sorta loses its original value, so isMember can be just bool. |
@LK4D4 I'm not sure if an atomic value could prevent it from race, without using lock. Suppose that func (n *Node) IsLeader() bool {
// n.isMember is 1, but node is stopping
if !n.IsMember() || n.Node == nil {
return false
}
// node is stopped, and n.isMember is set to 0
if n.Node.Status().Lead == n.Config.ID {
return true
}
return false
} |
@runshenzhu I'm pretty sure that atomic can't prevent it, that's why I propose to replace it with simple bool and use it as guard. |
updated: set @LK4D4 I tried to follow your suggestion of removing atomic but then go complains about race of accessing And we can't simply use lock to protect it, because if the leaving node is leader, a read lock is already held in Here I keep |
@@ -486,7 +492,7 @@ func (n *Node) stop() { | |||
|
|||
// IsLeader checks if we are the leader or not | |||
func (n *Node) IsLeader() bool { | |||
if !n.IsMember() { | |||
if !n.IsMember() || n.Node == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we still need nil check?
I still think that those checks need some more thoughts. |
|
||
if n.Node != nil { | ||
n.Stop() | ||
n.Node = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is setting Node
to nil still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronlehmann I think n.IsStopped()
needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, still confused about this. Now that we set both n.Node = nil
and isMember
to 0, isn't IsStopped
basically the same thing as IsMember
? I don't understand why we need both. It looks like the only place we use only IsStopped
is in LeaderAddr
, but I don't understand why that couldn't use IsMember
instead.
As @LK4D4 mentioned earlier, I don't understand why |
@aaronlehmann It will be a little complicated to use Using atomic will simplify the case. |
update: close |
n.removeRaftOnce.Do(func() { | ||
atomic.StoreUint32(&n.isMember, 0) | ||
close(n.removeRaftCh) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably factor the n.removeRaftOnce.Do
call into a function, since the same code appears in two places.
|
@aaronlehmann updated to use RLock(), and refactor it to remove redundant code. |
@runshenzhu: Any thoughts on this comment? #1288 (comment) |
@aaronlehmann As a quick fix, |
update: using read lock to protect This is a quick fix for #1288 comment. For a long term, we should figure out a better way to make sure |
Sounds good. You can just call it |
ac00684
to
dcebb13
Compare
update:
|
LGTM |
It sounds like it can be a serious issue. But I don't know if it should be included in minor release. |
Let's include #1310 first and then test this one on top of it. |
@runshenzhu would you mind to rebase? |
rebased. @LK4D4 @aaronlehmann @abronan PTAL |
Do not merge yet, seeing a panic when stopping nodes, probably a race on setting the node to nil (1 on 3 crashed):
|
The panic above is because I'm not convinced that we should set |
2b5aca5
to
0fb4cb7
Compare
One concern of using Updated to use But I'm wondering that should we check if node is stopped in |
@@ -963,10 +990,8 @@ func (n *Node) IsMember() bool { | |||
|
|||
// IsStopped checks if the raft node is stopped or not | |||
func (n *Node) IsStopped() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we remove this function completely?
Signed-off-by: Runshen Zhu <runshen.zhu@gmail.com>
updated to remove |
defer n.stopMu.RUnlock() | ||
|
||
if !n.IsMember() { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notice that if node is stopped, nil
is returned. I think this check is necessary, as @abronan's test shown above, this method could be called after node is stopped. In this case, without the check, n.Node.Status()
is executed after raft node is stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay.
LGTM |
1 similar comment
LGTM |
LOL, there is a very long fix/conversation list for this PR. Thanks for everyone's patient help, comment and review. |
while
n.IsStopped()
returns value by checking ifn.Node
is nil, we didn't setn.Node
correctly inn.stop()
. This PR fixes it by usingisMember
to tell if the node is stopped.Also, it should fix #1272.
ping @abronan @LK4D4
Signed-off-by: Runshen Zhu runshen.zhu@gmail.com