Allow managers not to expose a remote API port #1826

aaronlehmann · 2016-12-21T03:03:20Z

We'd like Docker to support swarm services without needing to run swarm init. One of the preconditions for this is that the swarm manager needs to operate and be useful without exposing a port to the outside world.

This PR is split into multiple parts, which will each become a separate PR for ease of review. The first few commits are focused on allowing the agent and certificate issuance/renewal to work without relying on a TCP connection to back to the same process. This works by exposing servers such as the dispatcher and node CA on the unix socket (or equivalent), and using that instead of a TCP connection. In theory we could codegen something to patch the calls straight through to the handlers (with some stuff injected onto the context to identify the caller as the local node), but for now, using the unix socket is good enough. This required some changes to the raft proxy to let it inject the local node identity onto the context when it's calling the handler locally.

Fix demotion (No automatic manager shutdown on demotion/removal #1829)
Expose the dispatcher, node CA, and a few other services on the UNIX socket. Inject the local node identity when they receive connections this way. (Expose needed services on the control socket #1828)
Add a connectionbroker package. This is a small abstraction on top of Remotes that provides a gRPC connection to a manager. If running on a manager, it uses the unix socket, otherwise it will pick a remote manager using Remotes. This allows things like agent and certificate renewal to work even on a single node with no TCP port bound. (Add connectionbroker package #1850)
Convert the agent and CA client to use connectionbroker. (Convert code to use connectionbroker package #1851)
Separate binding ports from creating the manager. Add methods to Manager and Node that allow binding a port after the manager is already running. A small change to raft allows the raft code to get the machine's external address after raft is already running, instead of relying on having it already. (This PR)
A simple integration test that binds a port after starting the first manager. (This PR)

cc @aluzzardi @LK4D4 @diogomonica @cyli @tonistiigi @stevvooe

aluzzardi · 2016-12-21T16:39:43Z

/cc @ehazlett @icecrime

aluzzardi · 2016-12-21T16:43:32Z

Awesome!

This also solves many issues we had with the local agent not connecting to the local manager, such as status drifting (e.g. Node both "down" and "reachable").

LK4D4 · 2016-12-21T17:25:34Z

@aaronlehmann can we split it into different prs for easier reviews? Connbroker change permeates almost all files, so it's hard to find other changes.

codecov-io · 2016-12-21T17:26:11Z

Current coverage is 53.38% (diff: 50.88%)

Merging #1826 into master will decrease coverage by 0.29%

@@             master      #1826   diff @@
==========================================
  Files           107        107          
  Lines         18403      18489    +86   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           9880       9871     -9   
- Misses         7308       7396    +88   
- Partials       1215       1222     +7

Powered by Codecov. Last update 9a7d5e6...0eca203

aaronlehmann · 2016-12-21T17:53:32Z

@LK4D4: This PR is split into 3 commits. I thought reviewing commit by commit would make sense, but if you think it's easier, I can split this into multiple PRs.

Would you like me to open a separate PR with just the connection broker commit, or would you like me to split that one into smaller pieces?

aaronlehmann · 2016-12-21T17:56:29Z

If you'd like, I could split it into a sequence of commits and/or PRs like this:

Expose dispatcher/NodeCA/etc on local socket, modify proxy to support this.
Add connection broker package
Update existing code to use connection broker instead of remotes
Add late-binding support to manager and node
Add integration test

LK4D4 · 2016-12-21T17:59:28Z

@aaronlehmann splitting plan will make it way easier for me to review. I think it's better to open different PRs.
Thanks!

aaronlehmann · 2016-12-21T18:05:34Z

OK. If it's alright with you, I'll leave this one open as a complete PR to show the big picture, and also open one PR at a time to get the individual pieces merged.

ehazlett · 2016-12-21T18:22:01Z

nice! design sgtm

aaronlehmann · 2016-12-21T19:49:30Z

Split the commits as discussed, and opened #1828 to review/merge the first piece. I've seen a few integration test failures with just this first one locally, but it's tricky to reproduce and I can't understand why this change would cause issues (since it's just exposing some services in a way that nothing uses yet). I wonder if the failures are related to a recent change on master instead of the changes here.

aaronlehmann · 2016-12-21T20:41:35Z

I have a theory about why this PR makes the integration tests flaky. I think it changes the timing enough that a demoted node can end up with a worker certificate before Raft creates outgoing connections. Then the manager blocks in WaitForLeader instead of shutting down on demotion like it's supposed to.

I'm going to open a PR to change demotion so that the manager is always shut down explicitly after a role change instead of shutting itself down. Even if that doesn't fix the problem here, it's something that badly needs to be done.

aaronlehmann · 2016-12-22T01:59:35Z

I'm going to open a PR to change demotion so that the manager is always shut down explicitly after a role change instead of shutting itself down. Even if that doesn't fix the problem here, it's something that badly needs to be done.

WIP PR at #1829. Unfortunately, it doesn't work yet.

diogomonica · 2016-12-28T15:33:00Z

Design LGTM.

aaronlehmann · 2017-01-05T03:10:36Z

I've rebased this on top of #1829 and made a few fixes. #1829 introduced an interesting issue. With this PR, a node that's a manager will always connect to itself instead of a random other manager for dispatcher and CA operations. However, if it has been demoted, it may not be able to issue certificates, and the CA client would not retry with other managers, so the demotion would not complete. I've fixed this so that after the first attempt to renew a certificate, the client will explicitly prefer random managers over the local connection. Also, a timeout was necessary for the certificate renewal request. If the node was demoted, it can get into a state where it waits forever for a leader.

This seems quite stable now. I'm not seeing any more issues with the integration tests.

I'll keep this open as a meta-PR, and update it as individual bits get merged. #1829 should be merged before #1828, and after that I can open more PRs for the other pieces.

aaronlehmann · 2017-01-07T01:32:38Z

#1829 was merged. Rebased.

aaronlehmann · 2017-01-09T18:52:16Z

#1828 was merged. Rebased to remove that part of this meta-PR.

Opened #1850 as the next step in getting these pieces merged.

aaronlehmann · 2017-01-09T22:12:55Z

connectionbroker was merged in #1850. Opened next PR: #1851.

dperny · 2017-01-10T19:38:07Z

manager/manager.go

@@ -132,9 +131,15 @@ type Manager struct {

 	cancelFunc context.CancelFunc

-	mu      sync.Mutex
+	mu     sync.Mutex
+	addrMu sync.Mutex


Can you document what these two mutexes control in a comment?

I guess it's more obvious when i look at the rest of the code.

Added comments

dperny · 2017-01-10T19:46:09Z

manager/manager.go

+		if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil {
+			l := <-m.controlListener
+			l.Close()
+			return nil, err


This is closing the control listener and then returning an error, if binding the remote fails, right?

Yes. It's safe to read from m.controlListener because this is before Run, so nothing else will be reading from that channel.

dperny · 2017-01-10T19:47:26Z

manager/manager.go

 	started chan struct{}
 	stopped bool
+
+	remoteListener  chan net.Listener
+	controlListener chan net.Listener


I'm not clear on why these listeners are returned through channels.

I think I see now; they're channels so the listeners (most importantly, the remote listener) can be hot-swapped, like when we take a "closed" one-node cluster and make it "open"?

dperny · 2017-01-10T19:49:55Z

manager/manager.go

+}
+
+// BindRemote binds a port for the remote API.
+func (m *Manager) BindRemote(ctx context.Context, addrs RemoteAddrs) error {


Nit: it may make more sense to put these methods definitions in the order that they're called in the initialization.

Order changed

dperny · 2017-01-10T19:56:39Z

manager/state/raft/raft.go

+		defer cancel()
+
+		isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1
+		for !isLeader {


I don't understand what this loop does.

As we discussed offline, currently only the leader can submit proposals such as configuration changes. For now, this SetAddr function is intended to address the narrow use case of a single-node cluster that didn't have ports bound before. Since the cluster only has one member, this node must be the leader, but there can be delays before that leadership is confirmed, for example at startup. This loop waits until the current node becomes the leader.

If/when we extend SetAddr to handle more general cases, we'll need some other approach here, such as an RPC we can use to tell the leader to update a node's address.

cyli · 2017-01-12T00:33:44Z

manager/manager.go

@@ -71,7 +71,7 @@ type Config struct {

 	// RemoteAPI is a listening address for serving the remote API, and
 	// an optional advertise address.
-	RemoteAPI RemoteAddrs
+	RemoteAPI *RemoteAddrs


Out of curiosity why the change to a pointer?

So it can be nil before it's initialized

I meant why not just check the default value, as opposed to nil? If ListenAddr is "", there is no remote api set, right? I'm not arguing that's better or anything - more just asking pointers are generally preferred over checking against the default value.

What you're suggesting should work fine. I felt that since RemoteAddrs is a struct, it might be confusing to check one particular field to see if the struct is initialized.

Ah ok, thank you for explaining

cyli · 2017-01-12T01:03:45Z

manager/state/raft/membership/cluster.go

+	newMember.RaftMember.Addr = newAddr
+	c.members[id] = &newMember
+
+	if oldMember.RaftMember.Addr != newAddr {


Non-blocking: maybe we can move the copying of the RaftMember object inside this block, since we don't need to make an update if the address hasn't changed?

cyli

LGTM

LK4D4 · 2017-01-12T18:50:15Z

manager/state/raft/raft.go

+
+	n.opts.Addr = addr
+
+	if n.IsMember() {


I think it's better to just return here on !n.IsMember()

LK4D4 · 2017-01-12T18:53:34Z

manager/state/raft/raft.go

+		leadershipCh, cancel := n.SubscribeLeadership()
+		defer cancel()
+
+		isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1


signalledLeadership is really weird name now :) we should call it isLeader and use it instead of isLeader function. Otherwise, it's quite confusing here.
Probably in another PR.

aaronlehmann · 2017-01-13T20:32:24Z

Rebased, PTAL

aaronlehmann · 2017-01-19T18:35:11Z

Rebased again, and added t.Parallel to the new test.

This should be ready to merge.

LK4D4 · 2017-01-23T19:03:38Z

manager/manager.go

+	if m.config.ControlAPI != "" {
+		return errors.New("manager already has a control API address")
+	}
+	m.config.ControlAPI = addr


I wonder if we should clear this in case of error.

LK4D4 · 2017-01-23T19:18:38Z

manager/state/raft/raft.go

+			if leadershipChange == IsLeader {
+				isLeader = true
+			}
+		case <-ctx.Done():


I'm not sure, but seems like this ctx might be context.Background() and it's possible that node could be stopped during this loop. Maybe we should use WithContext.

Switched to WithContext.

LK4D4 · 2017-01-23T19:26:52Z

manager/state/raft/raft.go

@@ -1092,6 +1151,11 @@ func (n *Node) reportNewAddress(ctx context.Context, id uint64) error {
 	if err != nil {
 		return err
 	}
+	if oldAddr == "" {


Is this really possible?

Yeah, I encountered it in the late-binding integration test. It happens when you create a manager without a bound port, then later bind one. The node update that adds the address gets committed to raft at some point, and new nodes that join the cluster may take a little while to catch up to that.

It's quite weird. newPeer calls grpc.Dial which is supposed to check if address is empty :/

Hmm, should we avoid calling grpc.Dial if newPeer is passed an empty address?

I dunno, there shouldn't be empty addresses and I wonder why it works and registers in peer list at all.

LK4D4 · 2017-01-23T19:55:17Z

manager/manager.go

+		m.config.RemoteAPI = nil
+		// The context isn't used in this case (before (*Manager).Run).
+		if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil {
+			l := <-m.controlListener


This can hang forever. So probably you need to check len(l).

Yes, this looks wrong. This code should only run if config.ControlAPI != "". I've fixed that.

Is there a use case where there is a remote API, but no control API?

There isn't currently any use case without a control API. It's structured that way to be consistent with BindRemote.

LK4D4 · 2017-01-23T19:55:30Z

manager/manager.go

+	if config.ControlAPI != "" {
+		m.config.ControlAPI = ""
+		if err := m.BindControl(config.ControlAPI); err != nil {
+			return nil, err


Shouldn't we close listener here as well?

If BindControl returns an error, there is no listener to close.

LK4D4 · 2017-01-23T19:58:15Z

manager/state/raft/membership/cluster.go

@@ -143,6 +143,7 @@ func (c *Cluster) UpdateMember(id uint64, m *api.RaftMember) error {
 		return nil
 	}
 	oldMember.RaftMember = m
+	c.broadcastUpdate()


LK4D4 · 2017-01-23T20:01:31Z

manager/state/raft/raft.go

@@ -342,6 +398,9 @@ func (n *Node) JoinAndStart(ctx context.Context) (err error) {
 	}

 	// join to existing cluster
+	if n.opts.Addr == "" {


I don't really understand events sequence. What ensures that BindRemote is called before Run?

It does not need to be called before Run. See the new test TestServiceCreateLateBind, where it is called afterwards.

Then I don't understand how does it work. Isn't it possible to call JoinAndStart before BindRemote and get attempted to join raft cluster without knowing own address?

You need to bind first if you are trying to join a cluster, but if you're starting a new cluster then you hit the n.opts.JoinAddr == "" branch and having an address is not required.

This adds BindRemote and BindControl methods to Manager, which can be used to specify the remote API and control API addresses after creating or starting the manager. Raft has been updated to accept a new remote address after being started. If The RemoteAPI and ControlAPI fields are passed to manager.New, it is not necessary to call BindRemote or BindControl explicitly. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

LK4D4 · 2017-01-23T22:51:59Z

LGTM

LK4D4 mentioned this pull request Dec 21, 2016

Flaky test TestDemoteDownedManager #1827

Closed

aaronlehmann force-pushed the late-port-binding branch from d1c9d94 to a9b4a97 Compare December 21, 2016 19:41

aaronlehmann mentioned this pull request Dec 21, 2016

Expose needed services on the control socket #1828

Merged

aaronlehmann mentioned this pull request Dec 28, 2016

dockerd exits when init swarm fails: network is unreachable moby/moby#29580

Closed

aaronlehmann changed the title ~~[WIP] Allow managers not to expose a remote API port~~ [Meta] Allow managers not to expose a remote API port Jan 5, 2017

aaronlehmann force-pushed the late-port-binding branch from a9b4a97 to dbde084 Compare January 5, 2017 03:10

aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from 827f310 to 7f69ecb Compare January 7, 2017 01:32

aaronlehmann mentioned this pull request Jan 9, 2017

return error when listNode fails moby/moby#29980

Merged

aaronlehmann force-pushed the late-port-binding branch from 7f69ecb to 02cf17a Compare January 9, 2017 18:45

aaronlehmann mentioned this pull request Jan 9, 2017

Add connectionbroker package #1850

Merged

aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from 000704c to 9cb5f3c Compare January 9, 2017 21:52

This was referenced Jan 10, 2017

Update vendored swarmkit to 62d835f moby/moby#30035

Merged

Integration test failure: TestDemoteToSingleManager #1680

Closed

Integration test failure: TestNodeOps #1673

Closed

dperny approved these changes Jan 10, 2017

View reviewed changes

cyli reviewed Jan 12, 2017

View reviewed changes

aaronlehmann force-pushed the late-port-binding branch from 31adb48 to e3b61ab Compare January 12, 2017 01:46

cyli approved these changes Jan 12, 2017

View reviewed changes

LK4D4 suggested changes Jan 12, 2017

View reviewed changes

aaronlehmann force-pushed the late-port-binding branch from e3b61ab to a7ece0b Compare January 13, 2017 20:07

aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from eead57f to ae6b1df Compare January 19, 2017 18:34

LK4D4 reviewed Jan 23, 2017

View reviewed changes

aaronlehmann force-pushed the late-port-binding branch from ae6b1df to 970f237 Compare January 23, 2017 20:01

LK4D4 reviewed Jan 23, 2017

View reviewed changes

aaronlehmann added 2 commits January 23, 2017 12:05

integration: Test late-binding of remote API port

0eca203

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann force-pushed the late-port-binding branch from 970f237 to 0eca203 Compare January 23, 2017 20:05

LK4D4 approved these changes Jan 23, 2017

View reviewed changes

LK4D4 merged commit 037b491 into moby:master Jan 23, 2017

aaronlehmann deleted the late-port-binding branch January 23, 2017 23:18

aaronlehmann mentioned this pull request Mar 28, 2017

Why a localhost addr appear in the Cluster info moby/moby#30119

Closed

Allow managers not to expose a remote API port #1826

Allow managers not to expose a remote API port #1826

Conversation

aaronlehmann commented Dec 21, 2016 • edited Loading

aluzzardi commented Dec 21, 2016

aluzzardi commented Dec 21, 2016

LK4D4 commented Dec 21, 2016

codecov-io commented Dec 21, 2016 • edited Loading

Current coverage is 53.38% (diff: 50.88%)

aaronlehmann commented Dec 21, 2016

aaronlehmann commented Dec 21, 2016

LK4D4 commented Dec 21, 2016

aaronlehmann commented Dec 21, 2016

ehazlett commented Dec 21, 2016

aaronlehmann commented Dec 21, 2016

aaronlehmann commented Dec 21, 2016

aaronlehmann commented Dec 22, 2016

diogomonica commented Dec 28, 2016

aaronlehmann commented Jan 5, 2017

aaronlehmann commented Jan 7, 2017

aaronlehmann commented Jan 9, 2017

aaronlehmann commented Jan 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlehmann commented Jan 13, 2017

aaronlehmann commented Jan 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LK4D4 commented Jan 23, 2017

aaronlehmann commented Dec 21, 2016 •

edited

Loading

codecov-io commented Dec 21, 2016 •

edited

Loading