Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cni: ensure to setup CNI addresses in deterministic order #17766

Merged
merged 3 commits into from
Jul 6, 2023

Conversation

ygersie
Copy link
Contributor

@ygersie ygersie commented Jun 29, 2023

Currently as commented in the code the go-cni library returns an unordered map
of interfaces. In cases where there are multiple CNI interfaces being created this
creates a problem with service registration and healthchecking because the first
address in the map is being used.

The use case we have where this is an issue is that we run CNI with the macvlan
plugin to isolate workloads, but they still need to be able to access the host on
a static address to be able to perform local resolving and hit host services like
the Consul agent API. To make this work there are 2 options, you either add a
macvlan interface on the host with an assigned address for each VLAN you have or
you create an additional veth bridged interface in the container namespace.
We chose the latter option through a custom CNI plugin but the ordering issue
leaves us with incorrect service registration.

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ygersie! This seems like a great fix to me with just one little additional change needed.

It will need a changelog entry which you can write with our make cl helper. Otherwise one of us can add it to the PR.

// Unfortunately the go-cni library returns interfaces in an unordered map so
// the results may be nondeterministic depending on CNI plugin output so make
// sure we sort them by interface name.
names := make([]string, 0, len(res.Interfaces))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to reuse the sorted list for iteration in the fallback case on L168 below (right after the If no IP address was found... comment). You could define names outside of the if here for reuse in the fallback iteration below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done, please let me know if there's anything that requires attention.

@schmichael schmichael moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Jun 29, 2023
@schmichael schmichael added backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.3.x backport to 1.3.x release line labels Jun 29, 2023
@schmichael schmichael added this to the 1.6.0 milestone Jun 30, 2023
 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks @ygersie !

Comment on lines +143 to +146
names := make([]string, 0, len(res.Interfaces))
for k := range res.Interfaces {
names = append(names, k)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for k := range res.Interfaces {
names = append(names, k)
}
sort.Strings(names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized while this makes interface selection deterministic, I don't think it allows for users to control what interface is used. It seems like whatever interface is first in lexicographical order wins.

Previously you could rely on a crash/restart loop to eventually get the interface you wanted, but now you'll get the same interface every time.

Do we need to add a way to specify an interface name, or am I missing a way interface names can be controlled by users so they can rely on lexicographical ordering to do the right thing automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in my case there was no crash restart loop, it just started running with incorrect consul service registration. The reason why I think, at least for now, lexicographical order makes sense is because CNI plug-in configurations are executed in order as well. If someone wanted to change which interface address should be chosen they could move that interface up in the config list. I’m not sure if there are more use cases that would benefit from being able to specify the interface name to use (first).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.String() will choose the first by lexicographical ordering though and not match CNI plugin configuration order though right?

It appears we could iterate over the interface names in CNI order via res.Raw().Interfaces and pluck the Config out of the res.Interfaces map based on those names?

Sorry for the complication, I think we'll just want to be able to document something like the first CNI interface returned will be used and not have alphabetical order mess things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that what I'm doing is kind of a hack as I have a single CNI network configuration file with an additional custom plugin (which is always executed in order as defined by the plugin array in the CNI conflist file) that sets up the eth1 bridge interface to connect to the host, ignoring what is set through CNI_IFNAME.

This custom plugin then returns the previous CNI result with the additional (eth1) interface and its ip/route configuration appended to it. There's no mention in the CNI spec that this is somehow unsupported and the spec allows multiple interfaces and ip assignments in the results. In my situation I could just skip updating the result with the new eth1 interface plus its configuration entirely. In that case Nomad wouldn't ever see that eth1 has been created, however, that would be against the spec that states that you should always update the results with what the plugin has changed. Here's an example result after running my plugin that created eth1:

{
  "cniVersion": "0.4.0",
  "interfaces": [
    {
      "name": "eth0",
      "mac": "8e:f1:74:40:97:b6",
      "sandbox": "/var/run/docker/netns/dce8eb432a25"
    },
    {
      "name": "eth1",
      "mac": "e6:b9:75:cb:39:10",
      "sandbox": "/var/run/docker/netns/dce8eb432a25"
    }
  ],
  "ips": [
    {
      "version": "",
      "interface": 0,
      "address": "10.177.191.183/24",
      "gateway": "10.177.191.1"
    },
    {
      "version": "4",
      "address": "169.254.0.65/32",
      "interface": 1
    }
  ],
  "routes": [
    {
      "dst": "0.0.0.0/0",
      "gw": "10.177.191.1"
    },
    {
      "dst": "169.254.1.1/32",
      "gw": "169.254.0.65"
    }
  ],
  "dns": {}
}

And the CNI configuration file used to create above:

{
  "cniVersion": "0.4.0",
  "name": "confidential",
  "plugins": [
    {
      "type": "macvlan",
      "master": "eth0.20",
      "ipam": {
        "type": "dhcp"
      },
      "mtu": 9000,
      "linkInContainer": false
    },
    {
      "type": "discovery"    <------ this sets up eth1
    }
  ]
}

Obviously whenever an external plugin would create additional interfaces we'd have the problem I'm facing now. I feel that sorting the interfaces in lexicographical order and always trying to use the address of the first interface is reasonable, especially since now it's a guessing game.

Copy link
Contributor Author

@ygersie ygersie Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also work fine in case of using the bridge plugin which would effectively return 3 interfaces:

  • host bridge
  • veth host peer
  • veth container peer

Because we check if an interface has an actual address set in the cniToAllocNet method there's no way that could lead to unexpected results since there's only one interface IP assignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation @ygersie! I'll get this merged. Going from non-deterministic -> deterministic is a great improvement, and we can always add an interface selection option in the future.

@schmichael schmichael merged commit 709f20c into hashicorp:main Jul 6, 2023
20 of 23 checks passed
Nomad - Community Issues Triage automation moved this from In Progress to Done Jul 6, 2023
schmichael pushed a commit that referenced this pull request Jul 7, 2023
* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.
schmichael pushed a commit that referenced this pull request Jul 7, 2023
* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.
schmichael pushed a commit that referenced this pull request Jul 7, 2023
* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.
schmichael added a commit that referenced this pull request Jul 7, 2023
…17851)

* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.

Co-authored-by: Yorick Gersie <6005868+ygersie@users.noreply.github.com>
schmichael pushed a commit that referenced this pull request Jul 7, 2023
…17826)

* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.

Co-authored-by: Yorick Gersie <6005868+ygersie@users.noreply.github.com>
schmichael pushed a commit that referenced this pull request Jul 8, 2023
…17827)

* cni: ensure to setup CNI addresses in deterministic order

  Currently as commented in the code the go-cni library returns an unordered map
  of interfaces. In cases where there are multiple CNI interfaces being created this
  creates a problem with service registration and healthchecking because the first
  address in the map is being used.

  The use case we have where this is an issue is that we run CNI with the macvlan
  plugin to isolate workloads, but they still need to be able to access the host on
  a static address to be able to perform local resolving and hit host services like
  the Consul agent API. To make this work there are 2 options, you either add a
  macvlan interface on the host with an assigned address for each VLAN you have or
  you create an additional veth bridged interface in the container namespace.
  We chose the latter option through a custom CNI plugin but the ordering issue
  leaves us with incorrect service registration.

* Updates after feedback

 * First check for the CNIResult interfaces length, if it's zero we don't need to proceed
   at all.
 * Use sorted interfaces list for the address fallback scenario as well.
 * Remove "found" log message logic, when an address isn't found an error is returned stating
   the allocation could not be configured as an address was missing from the CNIResult. If we
   still need a Warn message then we can add it to the condition that returns the error if no
   address could be found instead of using the "found" bool logic.

Co-authored-by: Yorick Gersie <6005868+ygersie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line hcc/cst Admin - internal
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants