Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI cilium: Fix and update to v1.15.3 #18846

Merged
merged 3 commits into from
May 10, 2024

Conversation

spowelljr
Copy link
Member

@spowelljr spowelljr commented May 9, 2024

Fixes: #18780

Before:

$ minikube start --driver docker --cni=cilium
😄  minikube v1.33.0 on Darwin 14.4.1 (arm64)
...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS             RESTARTS      AGE
kube-system   cilium-lzcxq                       0/1     CrashLoopBackOff   3 (34s ago)   2m4s
kube-system   cilium-operator-86f4c5579c-lklnm   1/1     Running            0             2m4s
kube-system   coredns-7db6d8ff4d-pqp9g           0/1     CrashLoopBackOff   4 (36s ago)   2m4s
kube-system   etcd-minikube                      1/1     Running            0             2m18s
kube-system   kube-apiserver-minikube            1/1     Running            0             2m18s
kube-system   kube-controller-manager-minikube   1/1     Running            0             2m18s
kube-system   kube-proxy-p8g96                   1/1     Running            0             2m4s
kube-system   kube-scheduler-minikube            1/1     Running            0             2m18s
kube-system   storage-provisioner                1/1     Running            0             2m17s

After:

$ minikube start --driver docker --cni=cilium
😄  minikube v1.33.0 on Darwin 14.4.1 (arm64)
...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE
kube-system   cilium-operator-64664858c8-dq28l   1/1     Running   0             112s
kube-system   cilium-wllgf                       1/1     Running   0             113s
kube-system   coredns-7db6d8ff4d-h75zh           1/1     Running   0             75s
kube-system   etcd-minikube                      1/1     Running   0             2m7s
kube-system   kube-apiserver-minikube            1/1     Running   0             2m7s
kube-system   kube-controller-manager-minikube   1/1     Running   0             2m7s
kube-system   kube-proxy-k5tdv                   1/1     Running   0             113s
kube-system   kube-scheduler-minikube            1/1     Running   0             2m7s
kube-system   storage-provisioner                1/1     Running   1 (82s ago)   2m5s

Verified on ISO as well

$ minikube start --driver qemu --cni=cilium
😄  minikube v1.33.0 on Darwin 14.4.1 (arm64)
...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A        
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   cilium-operator-56477b846b-9mhgw   1/1     Running   0          3m5s
kube-system   cilium-tzfmq                       1/1     Running   0          3m6s
kube-system   coredns-7db6d8ff4d-77pgq           1/1     Running   0          3m5s
kube-system   etcd-minikube                      1/1     Running   0          3m20s
kube-system   kube-apiserver-minikube            1/1     Running   0          3m20s
kube-system   kube-controller-manager-minikube   1/1     Running   0          3m20s
kube-system   kube-proxy-k7vsm                   1/1     Running   0          3m6s
kube-system   kube-scheduler-minikube            1/1     Running   0          3m20s
kube-system   storage-provisioner                1/1     Running   0          3m18s

And verified on multi-node also

$ minikube start --driver docker --nodes=3 --cni=cilium
😄  minikube v1.33.0 on Darwin 14.4.1 (arm64)
...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE
kube-system   cilium-8zm6w                       1/1     Running   0             84s
kube-system   cilium-b9cjg                       1/1     Running   0             83s
kube-system   cilium-operator-64664858c8-nkrq6   1/1     Running   0             84s
kube-system   cilium-plsq8                       1/1     Running   0             74s
kube-system   coredns-7db6d8ff4d-g9kg7           1/1     Running   0             20s
kube-system   etcd-minikube                      1/1     Running   0             98s
kube-system   kube-apiserver-minikube            1/1     Running   0             99s
kube-system   kube-controller-manager-minikube   1/1     Running   0             98s
kube-system   kube-proxy-5fk9b                   1/1     Running   0             84s
kube-system   kube-proxy-jcgdx                   1/1     Running   0             83s
kube-system   kube-proxy-lbtd5                   1/1     Running   0             74s
kube-system   kube-scheduler-minikube            1/1     Running   0             98s
kube-system   storage-provisioner                1/1     Running   1 (53s ago)   97s

What I did

I generated the cilium yaml using helm:

$ helm template cilium cilium/cilium --version 1.15.4 --namespace kube-system > cilium.yaml

Removed cilium-ca-secret.yaml and server-secret.yaml

Made cluster-pool-ipv4-cidr use a templated value as per d3dad9c

And changed replicas to 1

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 9, 2024
@spowelljr
Copy link
Member Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label May 9, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 9, 2024
@minikube-pr-bot

This comment has been minimized.

pkg/minikube/cni/cilium.go Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@medyagh
Copy link
Member

medyagh commented May 9, 2024

for ISO (qemu driver) I get this error in the pod logs

"failed while reinitializing datapath: listing routing rules: address family not supported by protocol"

fixAndUpdateCiliumCNI ✓
$ kc get pods -A
NAMESPACE     NAME                               READY   STATUS             RESTARTS      AGE
kube-system   cilium-cssbc                       0/1     CrashLoopBackOff   7 (99s ago)   13m
kube-system   cilium-operator-64664858c8-xbs79   1/1     Running            0             13m
kube-system   coredns-7db6d8ff4d-xvrmk           0/1     Pending            0             13m
kube-system   etcd-minikube                      1/1     Running            0             13m
kube-system   kube-apiserver-minikube            1/1     Running            0             13m
kube-system   kube-controller-manager-minikube   1/1     Running            0             13m
kube-system   kube-proxy-d96m9                   1/1     Running            0             13m
kube-system   kube-scheduler-minikube            1/1     Running            0             13m
kube-system   storage-provisioner                0/1     Pending            0             13m

the log for container

$ kc logs cilium-cssbc -n kube-system | grep error
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
time="2024-05-09T23:45:53Z" level=info msg="  --kvstore-max-consecutive-quorum-errors='2'" subsys=daemon
time="2024-05-09T23:45:54Z" level=error msg="Start hook failed" error="daemon creation failed: error while initializing daemon: failed while reinitializing datapath: listing routing rules: address family not supported by protocol" function="cmd.newDaemonPromise.func1 (cmd/daemon_main.go:1686)" subsys=hive
time="2024-05-09T23:45:54Z" level=error msg="Observer job stopped with an error" error="context canceled" func="auth.(*authMapGarbageCollector).handleIdentityChange" name="auth gc-identity-events" subsys=auth
time="2024-05-09T23:45:54Z" level=error msg="Observer job stopped with an error" error="context canceled" func="auth.(*AuthManager).handleAuthRequest" name="auth request-authentication" subsys=auth
time="2024-05-09T23:45:54Z" lev

but it could be only for ISO cilium/cilium#29965

@minikube-pr-bot

This comment has been minimized.

@medyagh
Copy link
Member

medyagh commented May 10, 2024

seems like if we downgrade to 1.15.3 should be fine for ISO as well cilium/cilium#31944

@spowelljr spowelljr changed the title CNI cilium: Fix and update to v1.15.4 CNI cilium: Fix and update to v1.15.3 May 10, 2024
@spowelljr spowelljr force-pushed the fixAndUpdateCiliumCNI branch 4 times, most recently from b972707 to cdc016b Compare May 10, 2024 22:35
@spowelljr spowelljr force-pushed the fixAndUpdateCiliumCNI branch from cdc016b to c213f42 Compare May 10, 2024 22:37
@spowelljr
Copy link
Member Author

spowelljr commented May 10, 2024

Confirmed it works on ISO as well now

% kubectl get pods -A        
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   cilium-operator-56477b846b-9mhgw   1/1     Running   0          3m5s
kube-system   cilium-tzfmq                       1/1     Running   0          3m6s
kube-system   coredns-7db6d8ff4d-77pgq           1/1     Running   0          3m5s
kube-system   etcd-minikube                      1/1     Running   0          3m20s
kube-system   kube-apiserver-minikube            1/1     Running   0          3m20s
kube-system   kube-controller-manager-minikube   1/1     Running   0          3m20s
kube-system   kube-proxy-k7vsm                   1/1     Running   0          3m6s
kube-system   kube-scheduler-minikube            1/1     Running   0          3m20s
kube-system   storage-provisioner                1/1     Running   0          3m18s

@spowelljr spowelljr merged commit 7777113 into kubernetes:master May 10, 2024
17 of 21 checks passed
@spowelljr spowelljr deleted the fixAndUpdateCiliumCNI branch May 10, 2024 23:07
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18846) |
+----------------+----------+---------------------+
| minikube start | 52.5s    | 53.1s               |
| enable ingress | 26.5s    | 26.8s               |
+----------------+----------+---------------------+

Times for minikube start: 51.3s 54.0s 51.4s 53.8s 51.9s
Times for minikube (PR 18846) start: 51.3s 54.9s 53.4s 51.5s 54.3s

Times for minikube ingress: 29.0s 25.7s 25.0s 24.5s 28.5s
Times for minikube (PR 18846) ingress: 28.0s 24.5s 28.5s 28.0s 25.0s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18846) |
+----------------+----------+---------------------+
| minikube start | 22.3s    | 21.8s               |
| enable ingress | 22.2s    | 22.3s               |
+----------------+----------+---------------------+

Times for minikube start: 21.0s 21.3s 24.0s 21.2s 24.0s
Times for minikube (PR 18846) start: 23.9s 21.3s 21.2s 21.0s 21.7s

Times for minikube ingress: 21.8s 23.3s 22.8s 21.8s 21.3s
Times for minikube (PR 18846) ingress: 21.8s 23.3s 22.8s 21.8s 21.8s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 18846) |
+----------------+----------+---------------------+
| minikube start | 21.9s    | 21.2s               |
| enable ingress | 39.0s    | 29.6s               |
+----------------+----------+---------------------+

Times for minikube ingress: 31.8s 32.3s 79.8s 31.3s 19.8s
Times for minikube (PR 18846) ingress: 31.8s 32.8s 18.8s 32.8s 31.8s

Times for minikube start: 23.2s 23.2s 20.4s 22.7s 20.0s
Times for minikube (PR 18846) start: 20.5s 23.3s 19.9s 19.9s 22.6s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_crio TestFunctional/parallel/MountCmd/specific-port (gopogh) 0.00 (chart)
Hyper-V_Windows TestPause/serial/Start (gopogh) 3.66 (chart)
Hyper-V_Windows TestOffline (gopogh) 3.70 (chart)
Hyper-V_Windows TestScheduledStopWindows (gopogh) 5.56 (chart)
Hyper-V_Windows TestForceSystemdFlag (gopogh) 6.06 (chart)
Docker_Linux_docker_arm64 TestErrorSpam/setup (gopogh) 6.92 (chart)
Docker_Linux_containerd_arm64 TestErrorSpam/setup (gopogh) 7.50 (chart)
Docker_Linux_crio_arm64 TestErrorSpam/setup (gopogh) 7.50 (chart)
Docker_Linux_crio TestErrorSpam/setup (gopogh) 7.50 (chart)
Docker_Linux TestErrorSpam/setup (gopogh) 7.64 (chart)
Docker_Linux_containerd TestErrorSpam/setup (gopogh) 8.18 (chart)
KVM_Linux_containerd TestErrorSpam/setup (gopogh) 8.81 (chart)
KVM_Linux_crio TestErrorSpam/setup (gopogh) 8.81 (chart)
KVM_Linux TestErrorSpam/setup (gopogh) 8.81 (chart)
Hyper-V_Windows TestDockerFlags (gopogh) 9.20 (chart)
Hyper-V_Windows TestMultiNode/serial/StopNode (gopogh) 9.26 (chart)
Hyper-V_Windows TestMinikubeProfile (gopogh) 11.11 (chart)
Hyper-V_Windows TestMountStart/serial/RestartStopped (gopogh) 11.46 (chart)
QEMU_macOS TestErrorSpam/setup (gopogh) 15.48 (chart)
Hyper-V_Windows TestCertExpiration (gopogh) 17.50 (chart)
Hyperkit_macOS TestErrorSpam/setup (gopogh) 21.74 (chart)
Hyper-V_Windows TestMultiControlPlane/serial/CopyFile (gopogh) 24.07 (chart)

To see the flake rates of all tests by environment, click here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coredns and Cilium are getting CrashLoopBackOff
4 participants