kubespray 2.18.0 calico failes without local-loadbalancer #8864

Talangor · 2022-05-24T15:16:27Z

hi guys and thank you for your hard work
previously I had installed kubernetes cluster with kubespray and weave cni without any problem (kubespray 2.18.0)
but since we need bgp functionality we decided to move to Calico CNI for a week i have tried the default configuration, the config you see today and tested with Kubernetes 1.23.6 to 1.22.2 with no success
i have been searching and found out if i run the localhost load balancer everything will work as expected but i don't want to use a local (nginx,haproxy) load balancer
is it mandatory to have use_localhost_as_kubeapi_loadbalancer: true?

Environment:

Cloud provider or hardware configuration:
bare-metal installation
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 5.4.0-113-generic x86_64
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Version of Ansible (ansible --version):
ansible [core 2.12.5]
config file = /home/ubuntu/kubespray-v2.18.1/ansible.cfg
configured module search path = ['/home/ubuntu/kubespray-v2.18.1/library']
ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
ansible collection location = /home/ubuntu/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
jinja version = 2.11.3
libyaml = True
Version of Python (python --version):
Python 3.8.10

Kubespray version (commit) (git rev-parse --short HEAD):
85bd1ee
2.18.1
Network plugin used:
netplan

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible:
ansible-playbook -i inventory/pre-production/hosts.yaml --become -u sadmin -K cluster.yml

Output of ansible run:

calico kube controller log:

all pods that need calico to create a network for them fail with the below log:

thanks in advance for taking your time

The text was updated successfully, but these errors were encountered:

cristicalin · 2022-05-24T17:37:51Z

use_localhost_as_kubeapi_loadbalancer: true is only needed when using calico with ebpf and if you don't set it kubespray defaults to False. The sample specifically states that this setting is there for cillium but it is needed for calico in ebpf mode as well, else it's not needed.

Could you be more precise with regard to the error you are seeing when setting this to False?

Talangor · 2022-05-24T18:56:07Z

@cristicalin thanks for fast responce
i did reinstall with calico_bpf_enabled: false
and use_localhost_as_kubeapi_loadbalancer line is commented out
i thought you meant disabling bpf
should i reinstall with use_localhost_as_kubeapi_loadbalancer: false ?
as far as i can tell its disabled by default
result : no change except API URL changed to api-service IP address

│ Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "10a355bb378aa245368c5c9ac05f3f4045e6aeadc8af25167c8a1a808b70782d": plugin type="calico" failed (add): error getting ClusterInformatio │
│ n: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable │
│ Warning FailedCreatePodSandBox 15m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "b14793930f295689742ca2112f49174c7634ec121283924955e83925fc2d5898": plugin type="calico" failed (add): error getting ClusterInformatio │
│ n: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded │
│ Warning FailedCreatePodSandBox 13m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "20e2ec4dd4d96b68f45d37dfe587e76bf3e19341b353aff57bd28545b2b467c4": plugin type="calico" failed (add): error getting ClusterInformatio │
│ n: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded │
│ Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "5ff44b8fd7abd7ef5c9aedf2f3aad5407b4c1b1b82e8e171dde1319b4620695a": plugin type="calico" failed (add): error getting ClusterInformatio │
│ n: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded │
│ Warning FailedCreatePodSandBox 8m16s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "f581e717e804dc43e5f6c0c814efe5270227846edc317a546f2d0e847f0d0dcf": plugin type="calico" failed (add): error getting ClusterInformatio │
│ n: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable │
│ Warning FailedCreatePodSandBox 30s (x3 over 5m30s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = │
│ Unknown desc = failed to setup network for sandbox "b5bf26f8e2ac33cf2ce5c01105745670d6e19ea652bb18e27aaab32b7f8cae70": plugin type="calico" failed (add): │
│ error getting ClusterInformation: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable

Talangor · 2022-05-25T08:02:48Z

the thing is that the default configuration available in kubespray does this too
my past experience with kubespray was that I could deploy kubernetes cluster with default yaml files and get it to work but this time I cant get it to work
I was suspecting RBAC or compatibility issue between calico version and kube versions but since it works with use_localhost_as_kubeapi_loadbalancer I discarded this thought
excuse me for lack of knowledge
### just a thought: isn't this issue due to kube-proxy and kubeadm refusing to serve API due to policy?

for more info ( clarification )

**** calico kube controller log: ****
W0525 07:23:09.449630 1 reflector.go:436] pkg/mod/github.com/projectcalico/k8s-client-go@v0.21.9-0.20220104180519-6bd7ec39553f/tools/cache/reflector │
│ 2022-05-25 07:23:09.449 [INFO][1] watchercache.go 97: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/n │
│ 2022-05-25 07:23:09.450 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/ipam/v2/assignment/" error=G │
│ 2022-05-25 07:23:09.450 [INFO][1] watchercache.go 245: Failed to create watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get "https:/ │
│ 2022-05-25 07:23:09.450 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes" │
│ 2022-05-25 07:23:09.450 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.o │
│ 2022-05-25 07:23:10.446 [WARNING][1] runconfig.go 161: unable to get KubeControllersConfiguration(default) error=Get "https://10.233.0.1:443/apis/crd.proj │
│ 2022-05-25 07:23:10.450 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/ipam/v2/assignment/" │
│ 2022-05-25 07:23:10.450 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes" │
│ 2022-05-25 07:23:10.451 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/ipam/v2/assignment/" error=G │
│ 2022-05-25 07:23:10.451 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.o │
│ E0525 07:23:10.801148 1 reflector.go:138] pkg/mod/github.com/projectcalico/k8s-client-go@v0.21.9-0.20220104180519-6bd7ec39553f/tools/cache/reflector │
│ 2022-05-25 07:23:11.448 [WARNING][1] runconfig.go 161: unable to get KubeControllersConfiguration(default) error=Get "https://10.233.0.1:443/apis/crd.proj │
│ 2022-05-25 07:23:11.454 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/ipam/v2/assignment/" │
│ 2022-05-25 07:23:11.454 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes" │
│ 2022-05-25 07:23:11.454 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/ipam/v2/assignment/" error=G │
│ 2022-05-25 07:23:11.454 [INFO][1] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.o │
│ 2022-05-25 07:23:11.708 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.233.0.1:443/ │
│ 2022-05-25 07:23:11.708 [ERROR][1] main.go 226: Failed to verify datastore error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformat │
│ 2022-05-25 07:23:11.708 [ERROR][1] main.go 257: Failed to reach apiserver error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformati │
│ 2022-05-25 07:23:12.449 [WARNING][1] runconfig.go 161: unable to get KubeControllersConfiguration(default) error=Get "https://10.233.0.1:443/apis/crd.proj │
│ 2022-05-25 07:23:12.455 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
**** calico pods****
Warning Unhealthy 33m (x2 over 33m) kubelet Readiness probe failed: calico/node is not ready: felix is not ready: Get "http://localhost:90 │
│ 99/readiness": dial tcp 127.0.0.1:9099: connect: connection refused │
│ Warning Unhealthy 33m kubelet Readiness probe failed: calico/node is not ready: felix is not ready: readiness probe reportin │
│ g 503
**** other pods ****
Warning FailedCreatePodSandBox 14m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "812cde7dc2338ecec5a205dd437943bba4a6cb21e761ae53d4be1a693acd6814": plugin type="calico" failed (add): error getting ClusterInformati │
│ on: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable │
│ Warning FailedCreatePodSandBox 11m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "d771f1ba4b37cb3d5be2ab5b2441be59242facbbda25c5976a271c4359c4e53e": plugin type="calico" failed (add): error getting ClusterInformati │
│ on: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded │
│ Warning FailedCreatePodSandBox 9m31s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup │
│ network for sandbox "f0c1377a528f09735df62a331928ef62305182398f1468fc1be8fb2a4bc1a781": plugin type="calico" failed (add): error getting ClusterInformati │
│ on: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded │
│ Warning FailedCreatePodSandBox 106s (x3 over 6m46s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code │
│ = Unknown desc = failed to setup network for sandbox "594499d0afbb43fcdcc18cfb00d3a9eae92572857322bf91f55c2114e1562911": plugin type="calico" failed (add) │
│ : error getting ClusterInformation: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable

Talangor · 2022-05-27T15:15:36Z

@cristicalin
I tried to install with weave and this happened
#8881
What's happening here am I so far off?
I'm sure you guys had tested the code but its really strange

Talangor · 2022-05-29T10:05:55Z

it seems that this proxy settings is propagated down to some process calling [https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default] API.

Finally I add NO_PROXY to all private subnet (e.g. 10.233.0.0/16 , 10.233.64.0/16) and fix this issue.

i suggest putting cluster domain ( .cluster.local ) and network cidrs in no_proxy default configuration

cristicalin · 2022-05-29T19:41:32Z

it seems that this proxy settings is propagated down to some process calling [https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default] API.

Finally I add NO_PROXY to all private subnet (e.g. 10.233.0.0/16 , 10.233.64.0/16) and fix this issue.

i suggest putting cluster domain ( .cluster.local ) and network cidrs in no_proxy default configuration

So this comes from setting http_proxy in your environment? Unfortunately we don't have a CI test case for this scenario so it's difficult to catch it when it's broken. Personally my envoronments don't require a proxy so its not a part of the code I see often.

If you want to push a PR with the code you changed we are happy to review and include it.

Talangor · 2022-05-30T15:47:01Z

i did add it like this in group_vars/all/all.yml
no_proxy: "node01,node02,node03,node04,node05,localhost,127.0.0.0,127.0.1.1,127.0.1.1,10.233.0.0/18,10.233.64.0/18,.cluster.local,local.home"

but if we want it to use variable maybe it should be like this:

in roles/kubespray-defaults/defaults/main.yaml
no_proxy: "{{ no_proxy | default ('{{ kube_service_addresses }}, {{ kube_pods_subnet }}, .{{ cluster_name }}') }}" NO_PROXY: "{{ no_proxy | default ('{{ kube_service_addresses }}, {{ kube_pods_subnet }}, .{{ cluster_name }}') }}"

in inventory/sample/group_vars/all/all.yml
// Refer to roles/kubespray-defaults/defaults/main.yml before modifying no_proxy
// Make sure you add kube_service_addresses, kube_pods_subnet and cluster_name
no_proxy: "{{ kube_service_addresses }}, {{ kube_pods_subnet }}, {{ cluster_name }}"
unfortunately, I don't have my test lab for some time now it's best if you could review it if not ill add this to my todo list and test it later on
I'm truly sorry
I should test and then give suggestions but I'm helpless right now
maybe it helps a bit tho

Talangor · 2022-05-31T13:11:52Z

@cristicalin
update
fortunately, I had the opportunity to test this code and its working as expected

**What type of PR is this?** /kind feature **What this PR does / why we need it**: sets kube CIDR's in No_proxy environment **Which issue(s) this PR fixes**:  Fixes kubernetes-sigs#8864 **Special notes for your reviewer**: the default configuration does not include no_proxy settings if one uses default config and sets proxy setting pods cannot connect to API service PS it's my first time creating an PR i'll include my code below **Does this PR introduce a user-facing change?**:  ```release-note NONE ``` **roles/kubespray-defaults/defaults/main.yaml** `no_proxy: "{{ no_proxy | default ('{{ kube_service_addresses }},{{ kube_pods_subnet }},{{ cluster_name }}') }}" ` `NO_PROXY: "{{ no_proxy | default ('{{ kube_service_addresses }},{{ kube_pods_subnet }},{{ cluster_name }}') }}"` **inventory/sample/group_vars/all/all.yml** `##Refer to roles/kubespray-defaults/defaults/main.yml before modifying no_proxy` `##Make sure you add kube_service_addresses, kube_pods_subnet and cluster_name below or pods cannot connect to API service` `no_proxy: "{{ kube_service_addresses }}, {{ kube_pods_subnet }}, {{ cluster_name }}"`

k8s-triage-robot · 2022-08-29T13:35:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-09-28T13:47:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-10-28T14:21:15Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-10-28T14:21:20Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vyom-soft · 2023-03-16T21:09:06Z

/reopen

k8s-ci-robot · 2023-03-16T21:09:12Z

@vyom-soft: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vyom-soft · 2023-03-16T21:09:54Z

Hello,
I am seeing the following error

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               72s   default-scheduler  Successfully assigned kube-system/kube-proxy-jhf8d to node5
  Warning  FailedCreatePodSandBox  12s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1f81d650b5e9f17d4a01973d52b53417352babd39290e1267770d6b2141f6a8b": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout

Talangor · 2023-03-17T05:33:14Z

hi
@vyom-soft
are you using proxy in your deployment?
if so you should eather set correct exception for your cluster or use offline installation and avoid using proxy.
in my case when i was using proxy my cluster would send its entire traffic through it and it would cause numerous problems.

Talangor added the kind/bug Categorizes issue or PR as related to a bug. label May 24, 2022

Talangor mentioned this issue Jun 1, 2022

add kube CIDR's in No_proxy environment #8911

Closed

Talangor mentioned this issue Jun 6, 2022

adds required no_proxy setting #8932

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubespray 2.18.0 calico failes without local-loadbalancer #8864

kubespray 2.18.0 calico failes without local-loadbalancer #8864

Talangor commented May 24, 2022 •

edited

Loading

cristicalin commented May 24, 2022 •

edited

Loading

Talangor commented May 24, 2022 •

edited

Loading

Talangor commented May 25, 2022 •

edited

Loading

Talangor commented May 27, 2022

Talangor commented May 29, 2022 •

edited

Loading

cristicalin commented May 29, 2022

Talangor commented May 30, 2022 •

edited

Loading

Talangor commented May 31, 2022 •

edited

Loading

k8s-triage-robot commented Aug 29, 2022

k8s-triage-robot commented Sep 28, 2022

k8s-triage-robot commented Oct 28, 2022

k8s-ci-robot commented Oct 28, 2022

vyom-soft commented Mar 16, 2023

k8s-ci-robot commented Mar 16, 2023

vyom-soft commented Mar 16, 2023

Talangor commented Mar 17, 2023

kubespray 2.18.0 calico failes without local-loadbalancer #8864

kubespray 2.18.0 calico failes without local-loadbalancer #8864

Comments

Talangor commented May 24, 2022 • edited Loading

cristicalin commented May 24, 2022 • edited Loading

Talangor commented May 24, 2022 • edited Loading

Talangor commented May 25, 2022 • edited Loading

Talangor commented May 27, 2022

Talangor commented May 29, 2022 • edited Loading

cristicalin commented May 29, 2022

Talangor commented May 30, 2022 • edited Loading

Talangor commented May 31, 2022 • edited Loading

k8s-triage-robot commented Aug 29, 2022

k8s-triage-robot commented Sep 28, 2022

k8s-triage-robot commented Oct 28, 2022

k8s-ci-robot commented Oct 28, 2022

vyom-soft commented Mar 16, 2023

k8s-ci-robot commented Mar 16, 2023

vyom-soft commented Mar 16, 2023

Talangor commented Mar 17, 2023

Talangor commented May 24, 2022 •

edited

Loading

cristicalin commented May 24, 2022 •

edited

Loading

Talangor commented May 24, 2022 •

edited

Loading

Talangor commented May 25, 2022 •

edited

Loading

Talangor commented May 29, 2022 •

edited

Loading

Talangor commented May 30, 2022 •

edited

Loading

Talangor commented May 31, 2022 •

edited

Loading