Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow replacing ingress network #2028

Merged
merged 1 commit into from
Mar 13, 2017
Merged

Allow replacing ingress network #2028

merged 1 commit into from
Mar 13, 2017

Conversation

aboch
Copy link

@aboch aboch commented Mar 9, 2017

Please see moby/moby/pull/31714 for details.

Signed-off-by: Alessandro Boch aboch@docker.com

@aboch
Copy link
Author

aboch commented Mar 9, 2017

This will now need a rebase, but wanted the review process to start.

// this information, so that on reboot or leader re-election, new leader
// will not automatically create the routing mesh if it cannot find one
// in store.
bool routingMeshOff = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here:

  • protobuf names should use underscores, for example routing_mesh_off. This gets converted to CamelCase by the protobuf compiler for Go code.
  • Maybe routing_mesh_disabled is better?
  • It's too bad we don't have a message inside Cluster to store networking options, to organize this and stuff like network_bootstrap_keys and encryption_key_lamport_clock. Maybe it's not too late to start one, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not opt for enabled/disabled because this variable indicates a state, which could be just temporary until user creates a custom ingress network, I thought on/off was more appropriate for a state while enabled/disabled is more for a config (which we do not have).

But not being a native speaker, routing_mesh_disabled works just as fine for me. Will change to that.

Copy link
Collaborator

@aaronlehmann aaronlehmann Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be possible to avoid keeping this state. Right now we create the default cluster object here:

https://github.com/docker/swarmkit/blob/a5eb9c0c8c82c544819f3811f9bfba44c1546998/manager/manager.go#L882

What if we created the ingress network at the same time, but only if the CreateCluster call succeeded? This would ensure that a swarm cluster always gets an ingress network by default, but if it is deleted later on, it would not be recreated automatically.

I'm not sure if this would be sort of hacky. It seems like it would simplify things a bit, but I don't have a strong opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to avoid the extra state, but the other options I had before were also sort of hacky, like to add a mock ingress network object to store, to denote the ingress nw need not to be created.

IIUC, the allocator would attempt to create the default cluster object, the failure will indicate this is a leader reelection, or a cluster start after a shutdown. In that case I can

if ErrNoIngress && cannot create cluster object {
  routingMeshDisabled == true
}

But what does it mean that allocator succeeds in creating the default cluster object ?
Once allocator gets a chance to run, isn't the default cluster obj already present in store, as created by the manager ? Or is the allcoator cluster object a mock one not seen by anybody else ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rejected adding the mock ingress network, because if user downgrade docker than previous version would not handle the mock network properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting moving ingress network creation out of the allocator, to where the initial cluster and node objects are created (see link). It kind of makes sense to do that, since there would be a single place where initial resources are created.

Every time the manager enters the leader state, it tries to create a default cluster object, because this might be a fresh swarm that has never had a leader before. We could also create an ingress network at this time, if creating the default cluster object succeeded (which means this is indeed a fresh swarm).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
Ideally would prefer allocating the networking stuff all in one place, in allocator/network.go.
But what you are suggesting would mean to just push to store the default ingress network object, not yet allocating it.
Let me give it a try.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it, it works. Thanks.

@@ -321,6 +321,9 @@ message NetworkSpec {
// enabled(default case) no manual attachment to this network
// can happen.
bool attachable = 6;

// Ingress indicates this network will provide the routing-mesh.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention in the comment that legacy ingress networks won't have this flag set, and instead have the com.docker.swarm.internal label, and the name ingress.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

clusters, err = store.FindClusters(readTx, store.ByName(store.DefaultClusterName))
}); err != nil {
return errors.Wrapf(err,
"failed to retrieve cluster object to check routing mesh state during init: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the %v with the err argument. This will include err twice in the string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I will change it.

// If ingress network is not found, create one right away
// using the predefined template.
if len(networks) == 0 {
nc.clusterID = clusters[0].ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check that clusters has a nonzero length. FindClusters will not return an error if it doesn't find any matching clusters.

There should always be a default cluster, so it would be a bug if one didn't exist. But I think it's better to return an error than crash in that case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, I was in fact not sure about this.
I will follow the length check logic I saw in other part of the code.

@@ -992,3 +1090,40 @@ func updateTaskStatus(t *api.Task, newStatus api.TaskState, message string) {
t.Status.Message = message
t.Status.Timestamp = ptypes.MustTimestampProto(time.Now())
}

func GetIngressNetwork(s *store.MemoryStore) (*api.Network, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need a comment to pass lint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

// if you change this function, you have to change createInternalNetwork in
// the tests to match it (except the part where we check the label).
if err := validateNetworkSpec(request.Spec, s.pg); err != nil {
if err := s.validateNetworkRequest(request.Spec, s.pg); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called inside the s.store.Update call below, to make sure multiple CreateNetwork invocations can't race to create multiple ingress networks.

You might consider changing GetIngressNetwork to take a store.ReadTx instead of a store.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good suggestion. I was under the impression we only run best effort check, and a concurrent ingress nw creation will anyway fail to be allocated by the allocator.
If we can protect against the race is better.

nc.clusterID = clusters[0].ID

// Check if we have the ingress network. If not found create
// it before reading all network objects for allocation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code to check for the ingress network and create it if necessary should be inside a store.Update transaction. Otherwise, something else could create an ingress network through the API at the same time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice. I will see if I can refactor it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this code is only executed on the elected leader once the allocator is started (run()).
Allocator is the only component which will check and create the ingress network if needed.
Also, I do not think the system can receive a user request to create an ingress network at this time.
Hopefully we do not need to worry about concurrent creations here.

if err != nil {
return errors.Wrap(err, "failed to find ingress network after creating it")
return errors.Wrapf(err, "failed to find ingress network after creating it: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%v", err is redundant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if doesServiceNeedIngress(service) {
if _, err := allocator.GetIngressNetwork(s.store); err != nil {
if grpc.Code(err) == codes.NotFound {
return grpc.Errorf(codes.PermissionDenied, "service needs ingress network, but ingress network is not present")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use codes.FailedPrecondition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if doesServiceNeedIngress(service) {
if _, err := allocator.GetIngressNetwork(s.store); err != nil {
if grpc.Code(err) == codes.NotFound {
return nil, grpc.Errorf(codes.PermissionDenied, "service needs ingress network, but ingress network is not present")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use codes.FailedPrecondition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will change it.

@@ -424,6 +425,33 @@ func (s *Server) checkSecretExistence(tx store.Tx, spec *api.ServiceSpec) error
return nil
}

func doesServiceNeedIngress(srv *api.Service) bool {
if srv.Spec.Endpoint.Mode != api.ResolutionModeVirtualIP {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint will be nil if unspecified in the API request, so be sure to check before dereferencing it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not hit this problem during testing. Probably the cli client fills in a default enpoint object, as it defaults to VIP mode. Here the right thing to do is to check for nil. Thanks.

@@ -691,6 +786,9 @@ func (a *Allocator) allocateService(ctx context.Context, s *api.Service) error {
// world. Automatically attach the service to the ingress
// network only if it is not already done.
if isIngressNetworkNeeded(s) {
if nc.ingressNetwork == nil {
return fmt.Errorf("Ingress network is missing")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase ingress

return n, nil
}
}
return nil, grpc.Errorf(codes.NotFound, "no ingress network found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better not to return a gRPC error here, because the allocator doesn't use gRPC at all. Maybe define an exported error and return that.

var ErrNoIngressNetwork = errors.New("no ingress network found")

Then callers can easily check if this error was returned, and translate to a gRPC error if appropriate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because I have the controlapi functions which validate the service and network specs call this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will export an error for that.

@codecov
Copy link

codecov bot commented Mar 10, 2017

Codecov Report

Merging #2028 into master will decrease coverage by 0.11%.
The diff coverage is 43.81%.

@@            Coverage Diff             @@
##           master    #2028      +/-   ##
==========================================
- Coverage   53.93%   53.82%   -0.12%     
==========================================
  Files         109      109              
  Lines       19100    19187      +87     
==========================================
+ Hits        10302    10327      +25     
- Misses       7561     7608      +47     
- Partials     1237     1252      +15

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b08186...f39ead8. Read the comment docs.

@aboch
Copy link
Author

aboch commented Mar 10, 2017

@aaronlehmann Updated. Tested reelection with the new logic you suggested, it works fine.

@@ -112,6 +114,13 @@ func (s *Server) CreateNetwork(ctx context.Context, request *api.CreateNetworkRe
}

err := s.store.Update(func(tx store.Tx) error {
if request.Spec.Ingress {
if n, err := allocator.GetIngressNetwork(s.store); err == nil {
return grpc.Errorf(codes.PermissionDenied, "ingress network (%s) is already present", n.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codes.AlreadyExists

}
for _, srv := range services {
if doesServiceNeedIngress(srv) {
return grpc.Errorf(codes.PermissionDenied, "ingress network cannot be removed because service %s depends on it", srv.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codes.FailedPrecondition

@aaronlehmann
Copy link
Collaborator

Design LGTM

@@ -163,8 +111,11 @@ func (a *Allocator) doNetworkInit(ctx context.Context) (err error) {
}
}

skipIngressNetworkAllocation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this label confusing because the flow from line 89 nc.ingressNetwork = ingressNetwork could also reach here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, code from line 89 is expected to reach here.

Let me see if modifying the if blocks into a switch makes it more clear, so that I will avoid using the goto.

Signed-off-by: Alessandro Boch <aboch@docker.com>
@aboch
Copy link
Author

aboch commented Mar 13, 2017

Thanks @dongluochen . I reworked the code to avoid the goto, via a switch, and added some comments.
I think the code is easier to follow now. PTAL.

Copy link
Contributor

@dongluochen dongluochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -112,6 +114,13 @@ func (s *Server) CreateNetwork(ctx context.Context, request *api.CreateNetworkRe
}

err := s.store.Update(func(tx store.Tx) error {
if request.Spec.Ingress {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it better to move this check into store.CreateNetwork(tx, n) where it already checks name duplicate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is appropriate to percolate the ingress notion down into the store.createNetwork().
It is true it is doing now a check on duplicated network name, though, but that is more generic, top level resource name.
If you guys think it makes sense, I can move it there now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either is fine.

@aaronlehmann
Copy link
Collaborator

LGTM

@aaronlehmann aaronlehmann merged commit 2b1b24b into moby:master Mar 13, 2017
}
}
return false
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aboch: Sorry for the post-merge comment. I was just looking at this again, and I noticed that this function is similar (but not the same as) isIngressNetworkNeeded in the allocator. Are the differences correct?

Would it be possible to combine these functions as an exported function the allocator? I think it's better for controlapi not to implement this logic directly.

Copy link
Author

@aboch aboch Mar 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to combine these functions as an exported function the allocator

Agree

Regarding the differences, I want to double check. Probably some are redundant given the spec validation that happen first.

@archisgore
Copy link

Was this intended to prevent service connections to any "Internal" network at all, or merely the ingress network? From documentation, I assumed internal was a generic network type that application developers could use to place sensitive services in.

@aboch
Copy link
Author

aboch commented May 9, 2017

@archisgore I am not sure I follow, I am adding some more background so that we are on the same page for further clarifications.

This change is to allow user to remove and/or replace the swarm ingress network. It gives interested user control over the ingress network configuration. The ingress network is the infrastructure network which provides the routing mesh for traffic ingressing the host.

In the swarmkit code this network was previously marked with a ...swarm.internal label, just for identification purpose.

In docker networking model, there is also an internal top level operator option which can be used when creating a network. It does isolate the containers on that network from the outside world.

This internal option cannot be accepted for the ingress network, they are conflicting concepts.

@archisgore
Copy link

Oh okay. I see. I'm seeing some strange behavior with the latest builds where:

docker network create --driver=overlay --internal foobar

Followed by a RemoteAPI call to start a service connected to that network returns:

"Error":"Error response from daemon: rpc error: code = 3 desc = Service cannot be explicitly attached to \"foobar\" network which is a swarm internal network"

But if I do:

docker network create --driver=overlay foobar

The same call works. I searched for that error string and found this commit.

@aboch
Copy link
Author

aboch commented May 9, 2017

Thank you @archisgore What you found is a bug. Will take care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants