-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow managers not to expose a remote API port #1826
Conversation
Awesome! This also solves many issues we had with the local agent not connecting to the local manager, such as status drifting (e.g. Node both "down" and "reachable"). |
@aaronlehmann can we split it into different prs for easier reviews? Connbroker change permeates almost all files, so it's hard to find other changes. |
Current coverage is 53.38% (diff: 50.88%)@@ master #1826 diff @@
==========================================
Files 107 107
Lines 18403 18489 +86
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 9880 9871 -9
- Misses 7308 7396 +88
- Partials 1215 1222 +7
|
@LK4D4: This PR is split into 3 commits. I thought reviewing commit by commit would make sense, but if you think it's easier, I can split this into multiple PRs. Would you like me to open a separate PR with just the connection broker commit, or would you like me to split that one into smaller pieces? |
If you'd like, I could split it into a sequence of commits and/or PRs like this:
|
@aaronlehmann splitting plan will make it way easier for me to review. I think it's better to open different PRs. |
OK. If it's alright with you, I'll leave this one open as a complete PR to show the big picture, and also open one PR at a time to get the individual pieces merged. |
nice! design sgtm |
d1c9d94
to
a9b4a97
Compare
Split the commits as discussed, and opened #1828 to review/merge the first piece. I've seen a few integration test failures with just this first one locally, but it's tricky to reproduce and I can't understand why this change would cause issues (since it's just exposing some services in a way that nothing uses yet). I wonder if the failures are related to a recent change on master instead of the changes here. |
I have a theory about why this PR makes the integration tests flaky. I think it changes the timing enough that a demoted node can end up with a worker certificate before Raft creates outgoing connections. Then the manager blocks in I'm going to open a PR to change demotion so that the manager is always shut down explicitly after a role change instead of shutting itself down. Even if that doesn't fix the problem here, it's something that badly needs to be done. |
WIP PR at #1829. Unfortunately, it doesn't work yet. |
Design LGTM. |
a9b4a97
to
dbde084
Compare
I've rebased this on top of #1829 and made a few fixes. #1829 introduced an interesting issue. With this PR, a node that's a manager will always connect to itself instead of a random other manager for dispatcher and CA operations. However, if it has been demoted, it may not be able to issue certificates, and the CA client would not retry with other managers, so the demotion would not complete. I've fixed this so that after the first attempt to renew a certificate, the client will explicitly prefer random managers over the local connection. Also, a timeout was necessary for the certificate renewal request. If the node was demoted, it can get into a state where it waits forever for a leader. This seems quite stable now. I'm not seeing any more issues with the integration tests. I'll keep this open as a meta-PR, and update it as individual bits get merged. #1829 should be merged before #1828, and after that I can open more PRs for the other pieces. |
827f310
to
7f69ecb
Compare
#1829 was merged. Rebased. |
7f69ecb
to
02cf17a
Compare
000704c
to
9cb5f3c
Compare
@@ -132,9 +131,15 @@ type Manager struct { | |||
|
|||
cancelFunc context.CancelFunc | |||
|
|||
mu sync.Mutex | |||
mu sync.Mutex | |||
addrMu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document what these two mutexes control in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's more obvious when i look at the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil { | ||
l := <-m.controlListener | ||
l.Close() | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is closing the control listener and then returning an error, if binding the remote fails, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's safe to read from m.controlListener
because this is before Run
, so nothing else will be reading from that channel.
started chan struct{} | ||
stopped bool | ||
|
||
remoteListener chan net.Listener | ||
controlListener chan net.Listener |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why these listeners are returned through channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see now; they're channels so the listeners (most importantly, the remote listener) can be hot-swapped, like when we take a "closed" one-node cluster and make it "open"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
} | ||
|
||
// BindRemote binds a port for the remote API. | ||
func (m *Manager) BindRemote(ctx context.Context, addrs RemoteAddrs) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it may make more sense to put these methods definitions in the order that they're called in the initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order changed
defer cancel() | ||
|
||
isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1 | ||
for !isLeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this loop does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, currently only the leader can submit proposals such as configuration changes. For now, this SetAddr
function is intended to address the narrow use case of a single-node cluster that didn't have ports bound before. Since the cluster only has one member, this node must be the leader, but there can be delays before that leadership is confirmed, for example at startup. This loop waits until the current node becomes the leader.
If/when we extend SetAddr
to handle more general cases, we'll need some other approach here, such as an RPC we can use to tell the leader to update a node's address.
@@ -71,7 +71,7 @@ type Config struct { | |||
|
|||
// RemoteAPI is a listening address for serving the remote API, and | |||
// an optional advertise address. | |||
RemoteAPI RemoteAddrs | |||
RemoteAPI *RemoteAddrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why the change to a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it can be nil
before it's initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant why not just check the default value, as opposed to nil? If ListenAddr
is ""
, there is no remote api set, right? I'm not arguing that's better or anything - more just asking pointers are generally preferred over checking against the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're suggesting should work fine. I felt that since RemoteAddrs
is a struct, it might be confusing to check one particular field to see if the struct is initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, thank you for explaining
newMember.RaftMember.Addr = newAddr | ||
c.members[id] = &newMember | ||
|
||
if oldMember.RaftMember.Addr != newAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: maybe we can move the copying of the RaftMember
object inside this block, since we don't need to make an update if the address hasn't changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
31adb48
to
e3b61ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
n.opts.Addr = addr | ||
|
||
if n.IsMember() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to just return here on !n.IsMember()
leadershipCh, cancel := n.SubscribeLeadership() | ||
defer cancel() | ||
|
||
isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signalledLeadership
is really weird name now :) we should call it isLeader
and use it instead of isLeader
function. Otherwise, it's quite confusing here.
Probably in another PR.
e3b61ab
to
a7ece0b
Compare
Rebased, PTAL |
eead57f
to
ae6b1df
Compare
Rebased again, and added This should be ready to merge. |
if m.config.ControlAPI != "" { | ||
return errors.New("manager already has a control API address") | ||
} | ||
m.config.ControlAPI = addr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should clear this in case of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if leadershipChange == IsLeader { | ||
isLeader = true | ||
} | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but seems like this ctx might be context.Background()
and it's possible that node could be stopped during this loop. Maybe we should use WithContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to WithContext
.
@@ -1092,6 +1151,11 @@ func (n *Node) reportNewAddress(ctx context.Context, id uint64) error { | |||
if err != nil { | |||
return err | |||
} | |||
if oldAddr == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I encountered it in the late-binding integration test. It happens when you create a manager without a bound port, then later bind one. The node update that adds the address gets committed to raft at some point, and new nodes that join the cluster may take a little while to catch up to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite weird. newPeer
calls grpc.Dial
which is supposed to check if address is empty :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should we avoid calling grpc.Dial
if newPeer
is passed an empty address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno, there shouldn't be empty addresses and I wonder why it works and registers in peer list at all.
m.config.RemoteAPI = nil | ||
// The context isn't used in this case (before (*Manager).Run). | ||
if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil { | ||
l := <-m.controlListener |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can hang forever. So probably you need to check len(l).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks wrong. This code should only run if config.ControlAPI != ""
. I've fixed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case where there is a remote API, but no control API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't currently any use case without a control API. It's structured that way to be consistent with BindRemote
.
if config.ControlAPI != "" { | ||
m.config.ControlAPI = "" | ||
if err := m.BindControl(config.ControlAPI); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we close listener here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BindControl
returns an error, there is no listener to close.
@@ -143,6 +143,7 @@ func (c *Cluster) UpdateMember(id uint64, m *api.RaftMember) error { | |||
return nil | |||
} | |||
oldMember.RaftMember = m | |||
c.broadcastUpdate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
ae6b1df
to
970f237
Compare
@@ -342,6 +398,9 @@ func (n *Node) JoinAndStart(ctx context.Context) (err error) { | |||
} | |||
|
|||
// join to existing cluster | |||
if n.opts.Addr == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand events sequence. What ensures that BindRemote
is called before Run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not need to be called before Run
. See the new test TestServiceCreateLateBind
, where it is called afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I don't understand how does it work. Isn't it possible to call JoinAndStart
before BindRemote
and get attempted to join raft cluster without knowing own address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to bind first if you are trying to join a cluster, but if you're starting a new cluster then you hit the n.opts.JoinAddr == ""
branch and having an address is not required.
This adds BindRemote and BindControl methods to Manager, which can be used to specify the remote API and control API addresses after creating or starting the manager. Raft has been updated to accept a new remote address after being started. If The RemoteAPI and ControlAPI fields are passed to manager.New, it is not necessary to call BindRemote or BindControl explicitly. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
970f237
to
0eca203
Compare
LGTM |
We'd like Docker to support swarm services without needing to run
swarm init
. One of the preconditions for this is that the swarm manager needs to operate and be useful without exposing a port to the outside world.This PR is split into multiple parts, which will each become a separate PR for ease of review. The first few commits are focused on allowing the agent and certificate issuance/renewal to work without relying on a TCP connection to back to the same process. This works by exposing servers such as the dispatcher and node CA on the unix socket (or equivalent), and using that instead of a TCP connection. In theory we could codegen something to patch the calls straight through to the handlers (with some stuff injected onto the context to identify the caller as the local node), but for now, using the unix socket is good enough. This required some changes to the raft proxy to let it inject the local node identity onto the context when it's calling the handler locally.
connectionbroker
package. This is a small abstraction on top ofRemotes
that provides a gRPC connection to a manager. If running on a manager, it uses the unix socket, otherwise it will pick a remote manager usingRemotes
. This allows things like agent and certificate renewal to work even on a single node with no TCP port bound. (Add connectionbroker package #1850)connectionbroker
. (Convert code to use connectionbroker package #1851)Manager
andNode
that allow binding a port after the manager is already running. A small change to raft allows the raft code to get the machine's external address after raft is already running, instead of relying on having it already. (This PR)cc @aluzzardi @LK4D4 @diogomonica @cyli @tonistiigi @stevvooe