Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: invalid memory address or nil pointer dereference #10419

Closed
bschapendonk opened this issue Jun 28, 2024 · 8 comments
Closed
Assignees
Labels
kind/upstream-issue This issue appears to be caused by an upstream bug
Milestone

Comments

@bschapendonk
Copy link

bschapendonk commented Jun 28, 2024

Environmental Info:
K3s Version:

k3s version v1.30.2+k3s1 (aa4794b)
go version go1.22.4

Node(s) CPU architecture, OS, and Version:

Ubuntu 24.04 (proxmox VM 4 cores and 4GB of RAM)
Linux k3s 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

single server/agent
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--flannel-ipv6-masq --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --service-cidr=10.43.0.0/16,2001:cafe:43::/112" INSTALL_K3S_CHANNEL=latest sh -s -

Describe the bug:

Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.770825 4475 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Jun 28 06:07:18 k3s k3s[4475]: time="2024-06-28T06:07:18Z" level=info msg="Handling backend connection request [k3s]"
Jun 28 06:07:18 k3s k3s[4475]: time="2024-06-28T06:07:18Z" level=info msg="Remotedialer connected to proxy" url="wss://10.185.4.103:6443/v1-k3s/connect"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.779773 4475 kube.go:622] List of node(k3s) annotations: map[string]string{"alpha.kubernetes.io/provided-node-ip":"10.185.4.103,2001:4c3c:4203:d101:be24:11ff:fe9b:c667", "flannel.alpha.coreos.com/backend-data":"{"VNI":1,
"VtepMAC":"a2:67:3a:dd:db:be"}", "flannel.alpha.coreos.com/backend-type":"vxlan", "flannel.alpha.coreos.com/backend-v6-data":"{"VNI":1,"VtepMAC":"be:78:fc:eb:a7:6a"}", "flannel.alpha.coreos.com/kube-subnet-manager":"true", "flannel.alpha.coreos.com/public-ip":
"10.185.4.103", "flannel.alpha.coreos.com/public-ipv6":"2001:4c3c:4203:d101:be24:11ff:fe9b:c667", "k3s.io/hostname":"k3s", "k3s.io/internal-ip":"10.185.4.103,2001:4c3c:4203:d101:be24:11ff:fe9b:c667", "k3s.io/node-args":"["server","--flannel-ipv6-masq","--cluster-ci
dr","10.42.0.0/16,2001:cafe:42::/56","--service-cidr","10.43.0.0/16,2001:cafe:43::/112"]", "k3s.io/node-config-hash":"HTV3HTKG5VUZXZNOT74AVGPJRQG5LXNGET7JCI6I2HCLVZ5THK5Q====", "k3s.io/node-env":"{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/82142f5157c67effc219a
eefe0bc03e0460fc62b9fbae9e901270c86b5635d53"}", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.779826 4475 vxlan.go:155] Interface flannel.1 mac address set to: a2:67:3a:dd:db:be
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.780061 4475 vxlan.go:183] Interface flannel-v6.1 mac address set to: be:78:fc:eb:a7:6a
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.780432 4475 iptables.go:51] Starting flannel in iptables mode...
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.780466 4475 iptables.go:115] Current network or subnet (10.42.0.0/16, 10.42.0.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.790479 4475 apiserver.go:52] "Watching apiserver"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.810617 4475 topology_manager.go:215] "Topology Admit Handler" podUID="cd572ac9-aec0-49d3-8f1c-2df861d0e20a" podNamespace="kube-system" podName="metrics-server-557ff575fb-nhpbg"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.810769 4475 topology_manager.go:215] "Topology Admit Handler" podUID="511a9a41-bbd7-40a3-9da5-8e6afd8491e0" podNamespace="kube-system" podName="coredns-576bfc4dc7-c8rmq"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.810960 4475 topology_manager.go:215] "Topology Admit Handler" podUID="fba25a76-196c-4bea-bfdb-d06b982f930a" podNamespace="flux-system" podName="helm-controller-76dff45854-qtlwv"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811077 4475 topology_manager.go:215] "Topology Admit Handler" podUID="4a622ba8-a457-413e-8569-791850104922" podNamespace="flux-system" podName="source-controller-54c89dcbf6-57sb4"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811168 4475 topology_manager.go:215] "Topology Admit Handler" podUID="0c9824b6-948b-4459-934d-995fc971f7b7" podNamespace="flux-system" podName="kustomize-controller-6bc5d5b96-58dr9"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811259 4475 topology_manager.go:215] "Topology Admit Handler" podUID="a95870af-e31c-4401-8a2e-82a3a446d875" podNamespace="flux-system" podName="notification-controller-7f5cd7fdb8-gmdf9"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811347 4475 topology_manager.go:215] "Topology Admit Handler" podUID="0ef526c2-3179-42b1-9dbc-9d10f70297d6" podNamespace="smokeping" podName="smokeping-deployment-6488cd6c-smfrx"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811431 4475 topology_manager.go:215] "Topology Admit Handler" podUID="e4773d80-5c20-44cb-a374-756167b3cbc5" podNamespace="ripe-atlas-probe" podName="ripe-atlas-probe-deployment-c96fbd8d4-2v4tw"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811516 4475 topology_manager.go:215] "Topology Admit Handler" podUID="287f3c8a-61d8-4395-bb5f-16f4fde314a4" podNamespace="kube-system" podName="traefik-ff9948bcb-wxtk8"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811603 4475 topology_manager.go:215] "Topology Admit Handler" podUID="e38543fd-45f4-4814-9793-9ce70fa53e6a" podNamespace="freshrss" podName="freshrss-54595cb597-szmnq"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811699 4475 topology_manager.go:215] "Topology Admit Handler" podUID="30eb036d-b899-4430-b10c-6c3befc32005" podNamespace="cert-manager" podName="cert-manager-cainjector-7477d56b47-nr4rl"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811815 4475 topology_manager.go:215] "Topology Admit Handler" podUID="802cce8d-40cb-4d64-a6ff-4962cf693f3b" podNamespace="cert-manager" podName="cert-manager-webhook-6d5cb854fc-tr9s4"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.811926 4475 topology_manager.go:215] "Topology Admit Handler" podUID="f5937a87-8da5-4b50-9621-2df7ada30955" podNamespace="cert-manager" podName="cert-manager-7ddc8df95d-xhzzp"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.812039 4475 topology_manager.go:215] "Topology Admit Handler" podUID="205b0e6f-34ce-4f00-a2c1-e70878c7be3f" podNamespace="kube-system" podName="helm-install-traefik-crd-s6wlg"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.812137 4475 topology_manager.go:215] "Topology Admit Handler" podUID="f2f989ac-36d8-43d3-9502-e6483845e829" podNamespace="kube-system" podName="helm-install-traefik-kpwrm"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.812233 4475 topology_manager.go:215] "Topology Admit Handler" podUID="233ef31e-ae96-4c66-bb26-aa794a01305c" podNamespace="kube-system" podName="local-path-provisioner-86f46b7bf7-k6k5f"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.812326 4475 topology_manager.go:215] "Topology Admit Handler" podUID="af6763b2-da4c-4d99-a8a6-25c5c9baca67" podNamespace="kube-system" podName="svclb-traefik-5fb09115-lsr9b"
Jun 28 06:07:18 k3s k3s[4475]: I0628 06:07:18.819482 4475 iptables.go:125] Setting up masking rules
Jun 28 06:07:18 k3s k3s[4475]: panic: runtime error: invalid memory address or nil pointer dereference
Jun 28 06:07:18 k3s k3s[4475]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x597bdb]
Jun 28 06:07:18 k3s k3s[4475]: goroutine 10280 [running]:
Jun 28 06:07:18 k3s k3s[4475]: math/big.(*Int).Cmp(0x0?, 0x62623a1?)
Jun 28 06:07:18 k3s k3s[4475]: /usr/local/go/src/math/big/int.go:381 +0x1b
Jun 28 06:07:18 k3s k3s[4475]: github.com/flannel-io/flannel/pkg/ip.IP6Net.Equal(...)
Jun 28 06:07:18 k3s k3s[4475]: /go/pkg/mod/github.com/flannel-io/flannel@v0.25.2/pkg/ip/ip6net.go:192
Jun 28 06:07:18 k3s k3s[4475]: github.com/flannel-io/flannel/pkg/trafficmngr/iptables.(*IPTablesManager).SetupAndEnsureMasqRules(0xc004e3b710, {0x71ea980, 0xc00179a8c0}, {0xa2a0000, 0x10}, {0x0, 0x0}, {0x0, 0x0}, {0xc00238e100, ...}, ...)
Jun 28 06:07:18 k3s k3s[4475]: /go/pkg/mod/github.com/flannel-io/flannel@v0.25.2/pkg/trafficmngr/iptables/iptables.go:131 +0x438
Jun 28 06:07:18 k3s k3s[4475]: github.com/k3s-io/k3s/pkg/agent/flannel.flannel({0x71ea980, 0xc00179a8c0}, 0xc0059f9fd0?, {0xc0093b2640, 0x34}, {0xc0100fddd0, 0x2d}, 0x1, 0xb)
Jun 28 06:07:18 k3s k3s[4475]: /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/flannel.go:103 +0x493
Jun 28 06:07:18 k3s k3s[4475]: github.com/k3s-io/k3s/pkg/agent/flannel.Run.func1()
Jun 28 06:07:18 k3s k3s[4475]: /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/setup.go:78 +0x46
Jun 28 06:07:18 k3s k3s[4475]: created by github.com/k3s-io/k3s/pkg/agent/flannel.Run in goroutine 1
Jun 28 06:07:18 k3s k3s[4475]: /go/src/github.com/k3s-io/k3s/pkg/agent/flannel/setup.go:77 +0x152
Jun 28 06:07:18 k3s systemd[1]: k3s.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 28 06:07:18 k3s systemd[1]: k3s.service: Failed with result 'exit-code'.
Jun 28 06:07:18 k3s systemd[1]: k3s.service: Consumed 9.643s CPU time.
Jun 28 06:07:23 k3s systemd[1]: k3s.service: Scheduled restart job, restart counter is at 9.

Steps To Reproduce:

  • Upgrading K3s and restarting the system:
    curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--flannel-ipv6-masq --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --service-cidr=10.43.0.0/16,2001:cafe:43::/112" INSTALL_K3S_CHANNEL=latest sh -s -

Expected behavior:
k3s starts

Actual behavior:
k3s go's into a restart loop and never starts

Additional context / logs:

Downgrading using
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--flannel-ipv6-masq --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --service-cidr=10.43.0.0/16,2001:cafe:43::/112" INSTALL_K3S_VERSION=v1.30.1+k3s1 sh -s -
works

Might be related to flannel-io/flannel#1968 / flannel-io/flannel#1969 which is part of https://github.com/flannel-io/flannel/releases/tag/v0.25.2

@brandond
Copy link
Member

brandond commented Jun 28, 2024

Yes, appears to be a bug in flannel - cc @manuelbuil - any reason we didn't bump flannel to 0.25.4 for this cycle?

@brandond brandond added the kind/upstream-issue This issue appears to be caused by an upstream bug label Jun 28, 2024
@brandond brandond moved this from New to In Triage in K3s Development Jun 28, 2024
@brandond brandond added this to the v1.30.3+k3s1 milestone Jun 28, 2024
@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Jun 28, 2024

The issue is not related to the Fix done in flannel that was done on the main.go file that it's not used on K3s. The issue is related to the fix that wasn't ported also on K3s flannel.go file.

@manuelbuil
Copy link
Contributor

Yes, appears to be a bug in flannel - cc @manuelbuil - any reason we didn't bump flannel to 0.25.4 for this cycle?

IIRC 0.25.3 and 0.25.4 are mostly fixes for windows code. It seems the bug is in the flannel code in k3s which is not adapted to a change that happened around iptables

@kyrofa
Copy link

kyrofa commented Jul 1, 2024

This is also broken in v1.29.6+k3s1. I was really hoping to get the fix for #9957 in there, but k3s doesn't even fire up because of this. Had to roll back to v1.29.5+k3s1.

How did this make it to stable? There must be situations where it doesn't crash in a loop?

@rbrtbnfgl rbrtbnfgl moved this from Peer Review to To Test in K3s Development Jul 1, 2024
@brandond
Copy link
Member

brandond commented Jul 1, 2024

@kyrofa It only crashes when you use the embedded flannel with an ipv6 cluster-cidr and enable --flannel-ipv6-masq, which is disabled by default. We don't currently have any tests that cover this non-default configuration, but will be adding some in response to this regression.

@kyrofa
Copy link

kyrofa commented Jul 1, 2024

Ah ha, thank you @brandond, much appreciated.

@bschapendonk
Copy link
Author

This morning I upgraded to v1.30.2-rc1+k3s2 which is working fine.
Thank you for fixing this 👍

@VestigeJ
Copy link

$ /var/lib/rancher/k3s/data/current/bin/flannel -V

CNI Plugin flannel version v1.4.0-flannel1+v0.25.4 (linux/amd64) commit HEAD built on 2024-07-18T00:33:12Z
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0

$ k3s -v

k3s version v1.30.3-rc1+k3s1 (086d9ca0)
go version go1.22.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/upstream-issue This issue appears to be caused by an upstream bug
Projects
Archived in project
Development

No branches or pull requests

7 participants