-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don’t attempt to reconnect swarm on failed join after timeout #27123
Conversation
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi is this only on master, or a fix for 1.12.1? |
@thaJeztah I think this can wait for |
LGTM |
Can we consider changing |
@aaronlehmann I remember @aluzzardi saw it as an important feature. Actually, it seems that swarmkit has already started to move away from that model as for example in #26646 swarmkit doesn't try to connect until network returns but fails out quite soon so we never even reach the timeout anymore. I'm not sure if this is the case with all the possible scenarios. If it is then in Docker side we should just remove the timeout completely and swarmkit either has to join in a meaningful time or give up with an error. |
@aluzzardi: Any thoughts? |
ping @aluzzardi PTAL! |
@aaronlehmann Well, right now we're half sync half async and we all agreed to move one way or another since right now automation is really painful. I believe we had a chat offline a while ago where, if I remember correctly, decided to go the async route (and make the CLI look synchronous?) |
LGTM |
1 similar comment
LGTM |
fixes #26646
The reproducible part of the bug was already fixed with the grpc changes in swarmkit, but this makes it more robust and makes it not rely on swarmkit timeouts.
The issue appeared because reconnecting expects state from remote hosts. There was no state because the join failed.
cc @mrjana
Signed-off-by: Tonis Tiigi tonistiigi@gmail.com