You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version of Ansible (ansible --version):
ansible 2.9.20
Version of Python (python --version):
python version = 3.8.10
Kubespray version (commit) (git rev-parse --short HEAD):
v2.17.0
Anything else do we need to know:
We have run into an intermittent issue with Calico and Kubespray where the ansible run to deploy a cluster on new machines was successful. But workloads are not able to communicate as nodelocaldns is unable to communicate with coredns. I was able to trace this connectivity issue to a routing issue, as I saw that traffic to 10.233/18 was routed through my default gateway. When comparing my routing table to other environments i noticed that the calico.vxlan interface was not created and thus not possible to route traffic using these interfaces.
With the help of the Calico community I identified that Kubespray doesn't setup the calico daemonset with the environment variable FELIX_VXLANENABLED=true. This setting enables the creation of this interface. I manually updated the daemonset with this environment value, the interface was created and all problems we're resolved.
Problems is that this issue occurs only once a few deployments. Calico documentation states that this value is set automatically, but this only happens if the IPPool CRD got updated by kubespray, then that automatic enablement will have been bypassed. But somehow this does not work in some cases. So I would like to ask to include this setting by default if VXLan is enabled on Calico.
The text was updated successfully, but these errors were encountered:
By the above effort, I think we have fixed this issue on both master and release-2.17 branch.
@ekuiper-sbp Please close this issue if confirming the issue has been solved.
Environment:
Cloud provider or hardware configuration:
On Prem - KVM
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Linux 3.10.0-1160.45.1.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Version of Ansible (
ansible --version
):ansible 2.9.20
Version of Python (
python --version
):python version = 3.8.10
Kubespray version (commit) (
git rev-parse --short HEAD
):v2.17.0
Anything else do we need to know:
We have run into an intermittent issue with Calico and Kubespray where the ansible run to deploy a cluster on new machines was successful. But workloads are not able to communicate as nodelocaldns is unable to communicate with coredns. I was able to trace this connectivity issue to a routing issue, as I saw that traffic to 10.233/18 was routed through my default gateway. When comparing my routing table to other environments i noticed that the calico.vxlan interface was not created and thus not possible to route traffic using these interfaces.
With the help of the Calico community I identified that Kubespray doesn't setup the calico daemonset with the environment variable FELIX_VXLANENABLED=true. This setting enables the creation of this interface. I manually updated the daemonset with this environment value, the interface was created and all problems we're resolved.
Problems is that this issue occurs only once a few deployments. Calico documentation states that this value is set automatically, but this only happens if the IPPool CRD got updated by kubespray, then that automatic enablement will have been bypassed. But somehow this does not work in some cases. So I would like to ask to include this setting by default if VXLan is enabled on Calico.
The text was updated successfully, but these errors were encountered: