Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: Fix hanging Stop method #2203

Merged
merged 1 commit into from
May 25, 2017

Conversation

aaronlehmann
Copy link
Collaborator

If raftNode.JoinAndStart failed, Stop will block forever because it waits for the manager to start up.

To fix this, close the "started" channel even if Run exits early due to an error. Fix the way the collector is initialized so its Stop method won't hang either.

Add a test that makes sure the node shuts down cleanly after a failed manager initialization.

cc @cyli

@cyli
Copy link
Contributor

cyli commented May 24, 2017

LGTM

@@ -361,7 +361,7 @@ func (n *Node) JoinAndStart(ctx context.Context) (err error) {
if err != nil {
n.stopMu.Lock()
// to shutdown transport
close(n.stopped)
n.cancelFunc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with this part of the code, but I didn't follow why this change is needed.

Copy link
Contributor

@cyli cyli May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.cancelFunc already calls close(n.stopped), and ensures that it is only called once, because closing it twice causes a panic

I think this would make it safer to call manager.Stop() even after JoinAndStart errors, because if JoinAndStart errors the channel is closed, and manager.Stop() calls raft.Node.Cancel(), which calls n.cancelFunc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyli thanks for the explanation, I understand better now.

If raftNode.JoinAndStart failed, Stop will block forever because it
waits for the manager to start up.

To fix this, close the "started" channel even if Run exits early due to
an error. Fix the way the collector is initialized so its Stop method
won't hang either.

Add a test that makes sure the node shuts down cleanly after a failed
manager initialization.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@codecov
Copy link

codecov bot commented May 24, 2017

Codecov Report

Merging #2203 into master will decrease coverage by 0.24%.
The diff coverage is 80%.

@@            Coverage Diff             @@
##           master    #2203      +/-   ##
==========================================
- Coverage   60.19%   59.94%   -0.25%     
==========================================
  Files         119      119              
  Lines       19835    19849      +14     
==========================================
- Hits        11939    11898      -41     
- Misses       6549     6604      +55     
  Partials     1347     1347

Copy link
Contributor

@nishanttotla nishanttotla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mavenugo mavenugo mentioned this pull request May 24, 2017
23 tasks
@aaronlehmann aaronlehmann merged commit ffd009f into moby:master May 25, 2017
@aaronlehmann aaronlehmann deleted the manager-stop-hanging branch May 25, 2017 01:03
@aluzzardi aluzzardi added this to the 17.06 milestone May 31, 2017
aluzzardi added a commit to aluzzardi/docker-ce that referenced this pull request Jun 1, 2017
Includes moby/swarmkit#2203

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
tiborvass pushed a commit to aluzzardi/docker-ce that referenced this pull request Jun 6, 2017
Includes:
- moby/swarmkit#2203
- moby/swarmkit#2210
- moby/swarmkit#2212

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
Includes:
- moby/swarmkit#2203
- moby/swarmkit#2210
- moby/swarmkit#2212

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
Includes:
- moby/swarmkit#2203
- moby/swarmkit#2210
- moby/swarmkit#2212

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 10, 2020
Includes:
- moby/swarmkit#2203
- moby/swarmkit#2210
- moby/swarmkit#2212

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
Includes:
- moby/swarmkit#2203
- moby/swarmkit#2210
- moby/swarmkit#2212

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
Signed-off-by: Tibor Vass <tibor@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants