Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple CIDR addresses for the pod SDN #14558

Merged
merged 2 commits into from
Sep 28, 2017

Conversation

JacobTanenbaum
Copy link
Contributor

@JacobTanenbaum JacobTanenbaum commented Jun 9, 2017

Work in progress PR for multiple CIDR address work. addresses can be defined in the master config file by

networkConfig:
  clusterNetworkCIDR: ""
  clusterNetworkConfig:
  - clusterNetworkCIDR: 10.128.0.0/28
  - clusterNetworkCIDR: 12.128.0.0/28
  externalIPNetworkCIDRs: null

or by passing a comma seporated list of them on the command line using --network-cidr="10.128.0.0/28,12.128.0.0/28"

besides general review - I could use some feedback on how to do a few things

  • How should I deal with a config file that defined ClusterNetworkCIDR the old way?

  • In pkg/sdn/plugin/master.go there where checks to see if the cluster cidr was shrunk, I don't think that check is still required. It should be valid to break a large cluster cidr into it's smaller components. Is checking that all objects are allocated in defined places enough? As done by checkclusterobjects() in pkg/sdn/plugin/common.go

  • I need to change oc get clusternetwork to show a comma separated list

  • finish the unit testing

review of the work so far and input on the above questions are appreciated
@knobunc @dcbw @danwinship @rajatchopra

@danwinship
Copy link
Contributor

addresses can be defined in the master config file by

networkConfig:
  clusterNetworkCIDR: ""
  clusterNetworkConfig:
  - clusterNetworkCIDR: 10.128.0.0/28
  - clusterNetworkCIDR: 12.128.0.0/28

Hm... Is there any reason you did it that way? It seems like it would be nicer to just have

clusterNetworkCIDRs:
- 10.128.0.0/28
- 12.128.0.0/28

aka

clusterNetworkCIDRs: [ "10.128.0.0/28", "12.128.0.0/28" ]
How should I deal with a config file that defined ClusterNetworkCIDR the old way?

Treat it the same as if it was defined the new way, but with only a single CIDR range

In pkg/sdn/plugin/master.go there where checks to see if the cluster cidr was shrunk, I don't think that check is still required. It should be valid to break a large cluster cidr into it's smaller components. Is checking that all objects are allocated in defined places enough? As done by checkclusterobjects() in pkg/sdn/plugin/common.go

Yes, that sounds right. Make sure the nodes do the right thing on restart as well.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2017
@knobunc
Copy link
Contributor

knobunc commented Jun 12, 2017

@openshift/networking

@knobunc
Copy link
Contributor

knobunc commented Jun 12, 2017

@danwinship the reason he did it that way is because we will shortly need to accommodate hot-resizing the subnets (or even supporting a way to have different size host subnets per range, perhaps with a way to label the subnets so different nodes can deliberately have different allocations).

Copy link
Contributor

@pecameron pecameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick pass through. I am learning about this as I go so not certain if this is at all helpful.

name := formatResourceName(opts.Kind, n.Name, opts.WithKind)
_, err := fmt.Fprintf(w, "%s\t%s\t%d\t%s\t%s\n", name, n.Network, n.HostSubnetLength, n.ServiceNetwork, n.PluginName)
_, err := fmt.Fprintf(w, "%s\t%s\t%d\t%s\t%s\n", name, strings.Join(cidrList, ","), n.HostSubnetLength, n.ServiceNetwork, n.PluginName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing them all on one line will mess up the report. I assume that people that like this feature will have a pretty fragments IP space.

Copy link

@pravisankar pravisankar Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be hard to read when there are several cidrs. May be one row for each cidr?

for _, cidr := range n.ClusterDef {
   , err := fmt.Fprintf(w, "%s\t%s\t%d\t%s\t%s\n", name, cidr. ...)
}

@@ -497,6 +497,8 @@ type MasterNetworkConfig struct {
NetworkPluginName string `json:"networkPluginName"`
// ClusterNetworkCIDR is the CIDR string to specify the global overlay network's L3 space
ClusterNetworkCIDR string `json:"clusterNetworkCIDR"`
// ClusterNetworkConfig is used to specify the global overlay network's L3 space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might mention that this is a list of cidrs that define the global overlay network's L3 space

@@ -497,6 +497,8 @@ type MasterNetworkConfig struct {
NetworkPluginName string `json:"networkPluginName"`
// ClusterNetworkCIDR is the CIDR string to specify the global overlay network's L3 space
ClusterNetworkCIDR string `json:"clusterNetworkCIDR"`
// ClusterNetworkConfig is used to specify the global overlay network's L3 space
ClusterNetworkConfig []ClusterNetworkEntry `json:"clusterNetworkConfig"`
// HostSubnetLength is the number of bits to allocate to each host's subnet e.g. 8 would mean a /24 network on the host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the minimum value that can be in the cidr list above. Might comment about this above.

@@ -20,38 +20,67 @@ func SetDefaultClusterNetwork(cn sdnapi.ClusterNetwork) {
defaultClusterNetwork = &cn
}

//intersect tests two CIDR addresses to see if the first contains the second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to contains, should check for first contained in second (unless second can't be larger than first). Should we check overlap with other defined spaces (service, exportIP, ...)?

// One poorly formated address and one well formatted
// Overlapping addresses
// Both Cluster.Network & clusterDef is defined
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test out of range (> 255) in IP field.
test mask (/xx) > 32, or < min size (at least as big as HostSubnetLength)
test overlap with other well defined spaces (service ip, external ip,...)
I think overlap can be either contained within or contained by.

@@ -30,7 +46,7 @@ func TestValidateClusterNetwork(t *testing.T) {
name: "Bad network",
cn: &api.ClusterNetwork{
ObjectMeta: metav1.ObjectMeta{Name: "any"},
Network: "10.20.0.0.0/16",
ClusterDef: []api.ClusterNetworkEntry{{ClusterNetworkCIDR: "10.20.0.0.0/16"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad network also 10.a.0.0/16, 10.256.0.0 and 0.0.0.0/33.

@@ -39,20 +39,36 @@ func clusterNetworkToString(n *osapi.ClusterNetwork) string {
return fmt.Sprintf("%s (network: %q, hostSubnetBits: %d, serviceNetwork: %q, pluginName: %q)", n.Name, n.Network, n.HostSubnetLength, n.ServiceNetwork, n.PluginName)
}

//determine if the first argument contains any of the second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this contains the second or is contained in the second?

func parseNetworkInfo(clusterNetwork []osapi.ClusterNetworkEntry, serviceNetwork string) (*NetworkInfo, error) {
var cn []*net.IPNet

for _, entry := range clusterNetwork {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check each cidr with all other cidrs (except for itself). How many cidrs do we support?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed. Do we want to restrict number of cidrs that we support? If users specify single IPs or sparse cidrs, this could be huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean check each cidr with all other cidrs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to verify there is no overlap between the given cidrs, otherwise pods may end up with same IPs.

} else if !ni.ClusterNetwork.Contains(subnetIP) {
errList = append(errList, fmt.Errorf("existing node subnet: %s is not part of cluster network: %s", subnet.Subnet, ni.ClusterNetwork.String()))
} else if _, contains := cidrListContains(ni.ClusterNetwork, subnetIP); !contains {
errList = append(errList, fmt.Errorf("existing node subnet: %s in not part of any cluster network CIDR", subnet.Subnet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in" ?

@@ -156,7 +174,7 @@ func getNetworkInfo(osClient *osclient.Client) (*NetworkInfo, error) {
return nil, err
}

return parseNetworkInfo(cn.Network, cn.ServiceNetwork)
return parseNetworkInfo(cn.ClusterDef, cn.ServiceNetwork)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about externalIP and load balancer IP here?

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as a first draft

ServiceNetworkCIDR string
NetworkPluginName string
ClusterNetworkCIDR string
ClusterNetworkConfig []ClusterNetworkEntry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go with just ClusterNetworks.

It would be nice if we could get rid of the old ClusterNetworkCIDR field, but I'm not sure how backward-compat works with startup config objects. Old master-config.yaml files need to keep working though, so whatever that requires...

@@ -626,6 +627,10 @@ type MasterNetworkConfig struct {
IngressIPNetworkCIDR string
}

type ClusterNetworkEntry struct {
ClusterNetworkCIDR string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just CIDR? You've already got ClusterNetwork in the struct name and field name...

Also, if the plan is to eventually have per-network HostSubnetLength values, I think we should add that here now; if we don't support different values yet, you can just add code to make sure every ClusterNetworkEntry has the same value.

(Also, if we're tweaking HostSubnetLength handling, we might want to consider renaming it and flipping the math around; the fact that it counts bits from the end of the IP address rather than the start is confusing. (Eg, HostSubnetLength of 9 means each node gets a /23 subnet (32 - 9 = 23). I think it would be clearer for most users if the config option was "23" rather than "9".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on CIDR.
If you plan to do global/local HostSubnetLength field that I was suggesting, then.

type ClusterNetworkEntry struct {
      ClusterNetworkCIDR string
      HostSubnetLength     uint32
}

@@ -635,7 +635,8 @@ func newAdmissionChain(pluginNames []string, admissionConfigFilename string, plu

case serviceadmit.RestrictedEndpointsPluginName:
// we need to set some customer parameters, so create by hand
restrictedNetworks, err := serviceadmit.ParseSimpleCIDRRules([]string{options.NetworkConfig.ClusterNetworkCIDR, options.NetworkConfig.ServiceNetworkCIDR})
// PLACEHOLDER HAVE TO FIGURE OUT OUT TO DO
restrictedNetworks, err := serviceadmit.ParseSimpleCIDRRules([]string{options.NetworkConfig.ServiceNetworkCIDR})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to pass an array containing all of the clusternetwork cidr values and the servicenetwork cidr value

@@ -20,6 +20,8 @@ type ClusterNetwork struct {

// Network is a CIDR string specifying the global overlay network's L3 space
Network string `json:"network" protobuf:"bytes,2,opt,name=network"`
// PLACEHOLDER
ClusterDef []ClusterNetworkEntry `json:"clusterDef" protobuf:"bytes,6,rep,name=clusterDef"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/kubernetes/community/blob/master/contributors/devel/api_changes.md has some information on changing kubernetes API objects, and in particular discusses the backward-compatibility snafus that come with trying to turn a single-element field into an array... (The "API Conventions" link near the top of that document is also filled with useful information about how kubernetes APIs work.)

The easiest way to make this work backward-compatibly would be to say that the "default"/first network is still specified by the Network and HostSubnetLength fields, but the 2nd to Nth networks are part of a new array field (which could be called AdditionalNetworks or something like that).

You need a doc comment above the field, and between the doc comment and the actual declaration should be

// +optional

to indicate that it's an optional field

@@ -28,6 +30,11 @@ type ClusterNetwork struct {
PluginName string `json:"pluginName,omitempty" protobuf:"bytes,5,opt,name=pluginName"`
}

// PLACEHOLDER
type ClusterNetworkEntry struct {
ClusterNetworkCIDR string `json:"clusterNetworkCIDR" protobuf:"bytes,1,opt,name=clusterNetworkCIDR"`
Copy link
Contributor

@danwinship danwinship Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as in pkg/cmd/server/api/types.go above

@@ -16,11 +16,16 @@ type ClusterNetwork struct {
metav1.ObjectMeta

Network string
ClusterDef []ClusterNetworkEntry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "unversioned" ClusterNetwork type (the one in pkg/sdn/api/types.go, as opposed to pkg/sdn/api/v1/types.go) is an internal-only object, not part of the API, so it is not subject to backward-compatibility rules. So you could get rid of the old Network and HostSubnetLength fields here and only have the new array.

The one catch with this is that if the unversioned type and the v1 type aren't identical, then you'll need to add some hand-written conversion functions, but this isn't difficult. There are no examples in pkg/sdn/api but you can see examples in other API groups (eg, pkg/image/api/v1/conversion.go).

HostSubnetLength uint32
ServiceNetwork string
PluginName string
}

type ClusterNetworkEntry struct {
ClusterNetworkCIDR string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as in pkg/cmd/server/api/types.go above

@pravisankar
Copy link

pravisankar commented Jun 13, 2017

With multiple cluster network CIDR support, hostsubnet length don't need to be same for each CIDR. We may not add this support immediately but it will be nice if we can make the config compatible with the future change that is more likely to be added.

How about this master config?

clusterNetworkCIDR:   (deprecate this one)
hostSubnetLength: 2   (global)
clusterNetworkCIDRs:
    - CIDR: 10.128.0.0/28
      hostSubnetLength: 2   (local)
    - CIDR: 12.128.0.0/28
      hostSubnetLength: 2
  • If local hostSubnetLength field is not set, apply the global hostSubnetLength (during default conversions)
  • If global and any of the local hostSubnetLength fields (if set) are not matching, bail out with an error similar to 'different host subnet lengths for cluster network cidrs are not currently supported'.

How should I deal with a config file that defined ClusterNetworkCIDR the old way?
I have deprecated/renamed fields in master/node config before (retaining backward compatibility),
you can check these PRs for reference:
6557c81
4663c08

//err = ioutil.WriteFile("/run/openshift-sdn/config.env", []byte(config), 0644)
//if err != nil {
// return false, err
//}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed in #13726

}
clusterNetworkConfig = append(clusterNetworkConfig, clusterNetworkEntry)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is command line arguments is for simple/essential use cases and complex use cases need to be specified in configuration file. If we treat multiple cidr use case as complex then we don't need to support command line args.
Supporting comma separated list here makes it harder to specify different hostsubnetLength for each cidr later.

@@ -20,6 +20,8 @@ type ClusterNetwork struct {

// Network is a CIDR string specifying the global overlay network's L3 space
Network string `json:"network" protobuf:"bytes,2,opt,name=network"`
// PLACEHOLDER
ClusterDef []ClusterNetworkEntry `json:"clusterDef"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to add protobuf tag

@@ -20,38 +20,67 @@ func SetDefaultClusterNetwork(cn sdnapi.ClusterNetwork) {
defaultClusterNetwork = &cn
}

//intersect tests two CIDR addresses to see if the first contains the second
func intersect(cidr1, cidr2 string) bool {
_, net1, _ := net.ParseCIDR(cidr1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error (similar to cidr2 below)

@@ -178,3 +190,20 @@ func clusterNetworkChanged(obj *osapi.ClusterNetwork, old *osapi.ClusterNetwork)

return changed, nil
}

func clusterDefEquals(obj, old []osapi.ClusterNetworkEntry) bool {
if obj == nil && old == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, condition below covers this check.

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2017
@JacobTanenbaum
Copy link
Contributor Author

@danwinship @knobunc @pravisankar Should I change the math for hostSubnetLength field? Won't that How could we upgrade a config from the old math to the new math for hostsubnetlength?

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2017
@pravisankar
Copy link

From the stand point of user using multiple cidr, with the current implementation user may not use a cidr because it conflicts with global HostSubnetLength (they don't have enough bits in the cidr to allocate for the pod). I feel cidr and hostSubnetLength go hand in hand. Why don't we work on supporting multiple cidr and hostsubnetLength per cidr? This will remove some of these limitations and will avoid unnecessary migration later on.

@openshift/networking

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2017
@JacobTanenbaum JacobTanenbaum force-pushed the CIDR-work branch 2 times, most recently from 6010d34 to 6cfab3e Compare June 22, 2017 20:40
@JacobTanenbaum JacobTanenbaum force-pushed the CIDR-work branch 6 times, most recently from 024dfea to bb7d9b7 Compare July 6, 2017 15:30
@JacobTanenbaum
Copy link
Contributor Author

@danwinship @pravisankar is this moving in the right direction? I deprecated the global hostsubnetlength field and added one per cidr. I also changed it so only one could be defined on the cli and others need to be added to the config

@knobunc
Copy link
Contributor

knobunc commented Sep 27, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2017
@pravisankar
Copy link

/lgtm

HostSubnetLength uint32 `json:"hostSubnetLength"`
// DeprecatedClusterNetworkCIDR is the CIDR string to specify the global overlay network's L3 space. Deprecated, but maintained for backwards compatibility, use ClusterNetworks instead.
DeprecatedClusterNetworkCIDR string `json:"clusterNetworkCIDR,omitempty"`
// ClusterNetworks is a list of ClusterNetwork objects that defines the global overlay network's L3 space by specifying a set of CIDR and netmasks that the SDN can allocate addressed from. If this is specified, then DeprecatedClusterNetworkCIDR and DeprecatedHostSubnetLength may not be set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the external field name - a user won't have any idea what "DeprecatedClusterNetworkCIDR" is without looking at the go code. In general, always use external field names in thees docs, no exceptions.

@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2017
HostSubnetLength uint32 `json:"hostSubnetLength"`
// ClusterNetworkCIDR is the CIDR string to specify the global overlay network's L3 space. Deprecated, but maintained for backwards compatibility, use ClusterNetworks instead.
DeprecatedClusterNetworkCIDR string `json:"clusterNetworkCIDR,omitempty"`
// ClusterNetworks is a list of ClusterNetwork objects that defines the global overlay network's L3 space by specifying a set of CIDR and netmasks that the SDN can allocate addressed from. If this is specified, then DeprecatedClusterNetworkCIDR and DeprecatedHostSubnetLength may not be set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments got lost AGAIN. Basically you can't use external names here. Use the names as they would appear in JSON. Users only see this from the API and documentation, they never see the go code, so the names in Go are meaningless.

clusterNetworkCIDR and hostSubnetLength are the names they see

Chaning the Network Config section of the the master config to
allow multiple CIDR addresses and hostsubnet Lengths for the
allocation of nodes' address space
@knobunc
Copy link
Contributor

knobunc commented Sep 27, 2017

@smarterclayton Ok, good to go?

@JacobTanenbaum
Copy link
Contributor Author

/test cmd

@smarterclayton
Copy link
Contributor

API approved

@knobunc
Copy link
Contributor

knobunc commented Sep 27, 2017

/lgtm

@smarterclayton would you approve this please too? Thanks

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 27, 2017
@eparis
Copy link
Member

eparis commented Sep 28, 2017

/approve

@eparis eparis added the kind/bug Categorizes issue or PR as related to a bug. label Sep 28, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eparis, knobunc, pravisankar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2017
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 28, 2017

@JacobTanenbaum: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/check d758c79 link /test check

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@JacobTanenbaum
Copy link
Contributor Author

JacobTanenbaum commented Sep 28, 2017

flake #16414 /retest

@eparis
Copy link
Member

eparis commented Sep 28, 2017

/retest

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 14558, 16544).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. needs-api-review size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet