Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the CNI_CONF_NAME to 10-calico.conflist in calico yaml causes CNI to fail #1388

Closed
bradbehle opened this issue Nov 21, 2017 · 10 comments

Comments

@bradbehle
Copy link
Contributor

Changing the CNI_CONF_NAME to 10-calico.conflist in calico yaml causes CNI to fail.

Expected Behavior

The CNI should continue to function when the name is changed to .conflist to support multiple plugins

Current Behavior

In hosted calico for kubernetes, when the CNI_CONF_NAME is changed to 10-calico.conflist (I did this to add the portmap plugin along with the calico cni plugin I already had there) and then calico-node is restarted on the worker nodes, the old 10-calico.conf file is not removed. Having both the old and new configuration files in /etc/cni/net.d/ causes the CNI to fail, and new pods can not get an IP address.

Possible Solution

When the .conflist (or file with a different new name) is generated by the install-cni container, it could look for other calico config files and either delete them, or rename them in such a way that CNI ignores them (if this is possible)

Steps to Reproduce (for bugs)

  1. Install kubernetes with hosted calico, using the default cni conf file name and just the calico plugin
  2. Update the calico-config config map to add the portmap plugin, and update the calico-node daemonset to set CNI_CONF_NAME is changed to 10-calico.conflist
  3. Wait for a few minutes for the calico-node pods to restart and for the CNI config change to be written to the new conflist file
  4. Try to create a pod, and see that it will not get an IP address (CNI errors are reported in the logs, I believe due to multiple config files
  5. Delete the old config file on each worker node manually, and the cluster will start working again

Context

Trying to update calico to use the portmap plugin as well causes the whole cluster to be hosed. Manually deleting the conf file from each worker nodes first requires that you know the inner workings of calico to know how the CNI conf file is handled and where it is, and also can require a lot of manual steps (or the writing of a script) for clusters with many worker nodes.

Your Environment

  • Calico version: 2.6.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.7.4
  • Operating System and version: Ubuntu 16.04
@caseydavenport caseydavenport added this to the Calico v2.6.4 milestone Nov 21, 2017
@caseydavenport
Copy link
Member

@bradbehle thanks for raising.

We've actually merged this PR, which does what you're suggesting: projectcalico/cni-plugin#392

It's in v3.0 but hasn't made it to v2.6 yet. I've added this to the v2.6.4 milestone to get this cherry-picked.

(CNI errors are reported in the logs, I believe due to multiple config files

One thing though, I'm surprised there would be errors related to multiple config files. I think the kubelet just picks the first one and ignores the rest.

Could you share some of these logs?

@bradbehle
Copy link
Contributor Author

bradbehle commented Nov 22, 2017

Here is an excerpt of the syslog on the worker node where I found the CNI unable to handle the two conf files (one with .conf extension and one with .conflist extension). Let me know if you want other logs, but I think this should show it clearly.

cni_extra_conf_files_logs.txt

It would be great if you could get this into 2.6.4. We'd like to provide hostPort support in our next update, and we are unable to go to 3.0 at the moment. Thanks.

@caseydavenport
Copy link
Member

NetworkPlugin cni failed to set up pod "nginx-2589664690-tx4ks_default" network: no plugin name provided

Looks like this is the primary error I'm seeing in those logs, which I think just means the top-level config is missing the name field.

@bradbehle
Copy link
Contributor Author

Is there a way to work around this so we can update to use portmap (and the needed .conflist suffix) and not hit this CNI error? Or do we need the fix that is coming in 2.6.4?

@caseydavenport
Copy link
Member

@bradbehle you shouldn't need the fix coming in 2.6.4.

I would think that a CNI config called 10-calico.conflist with the following contents would work:

    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.0",
      "plugins": [
        {
          "type": "calico",
          "etcd_endpoints": "__ETCD_ENDPOINTS__",
          "log_level": "info",
          "mtu": 1500,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s",
               "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
               "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
          },
          "kubernetes": {
              "kubeconfig": "/etc/cni/net.d/__KUBECONFIG_FILENAME__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        }
      ]
    }

@caseydavenport
Copy link
Member

@bradbehle did you have any luck with adding a name to the config?

@caseydavenport caseydavenport removed this from the Calico v2.6.4 milestone Dec 14, 2017
@caseydavenport
Copy link
Member

The soon-to-be-released v2.6.4 will delete the old 10-calico.conf file which so there won't be multiple manifests.

@vzuevsky
Copy link

every time i change things in /etc/cni/net.d/10-calico.conflist it gets overwritten on restart. Any idea how could i change it permanently?

@caseydavenport
Copy link
Member

@vzuevsky that file is written from a configmap. Are you using the operator install or manifest install?

@vzuevsky
Copy link

Hi Casey, figured that out. Manifest was updated. I am curious though, where all those settings end up living after the manifest is applied - couldn't find them by grepping. Re-applying would be the right way anyway, so feel free to close this chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants