Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

kinarashah · 2021-07-19T23:00:19Z

Upgraded rancher server from v2.5.9 to master-head 2f7d673b5193. Saw the panic below in logs, container restarted and was fine later.

E0719 19:15:33.421340      36 leaderelection.go:361] Failed to update lock: Put "https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s": context deadline exceeded
I0719 19:15:33.497822      36 event.go:291] "Event occurred" object="kube-system/kube-controller-manager" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="d07635975132_ada790ac-2121-4891-9365-bf5de306fdbf stopped leading"
I0719 19:15:33.497919      36 event.go:291] "Event occurred" object="kube-system/kube-controller-manager" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="d07635975132_ada790ac-2121-4891-9365-bf5de306fdbf stopped leading"
I0719 19:15:33.514108      36 leaderelection.go:278] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
F0719 19:15:33.518947      36 controllermanager.go:293] leaderelection lost
goroutine 2019 [running]:
github.com/rancher/k3s/vendor/k8s.io/klog/v2.stacks(0xc000128001, 0xc0236fe2a0, 0x4c, 0xde)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).output(0x6ef5aa0, 0xc000000003, 0x0, 0x0, 0xc01cfdcfc0, 0x6b8ad4a, 0x14, 0x125, 0x0)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:945 +0x191
github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).printf(0x6ef5aa0, 0x3, 0x0, 0x0, 0x4514b1d, 0x13, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:733 +0x17a
github.com/rancher/k3s/vendor/k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1456
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.Run.func2()
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:293 +0x73
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc00a3f7d40)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0x29
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc00a3f7d40, 0x4c77280, 0xc00f744300)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:210 +0x15d
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.RunOrDie(0x4c772c0, 0xc000124018, 0x4c979c0, 0xc00ed42220, 0x37e11d600, 0x2540be400, 0x77359400, 0xc00ed42200, 0x46ecc48, 0x0, ...)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:222 +0x9c
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.Run(0xc00f12c4f0, 0xc00010e660, 0xc006f0d350, 0xc006f34be8)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:285 +0x979
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.NewControllerManagerCommand.func2(0xc0081dcb00, 0xc00e5cbad0, 0x0, 0xd)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:124 +0x2b7
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).execute(0xc0081dcb00, 0xc003494200, 0xd, 0x10, 0xc0081dcb00, 0xc003494200)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0081dcb00, 0x0, 0x0, 0xc002cf4340)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:887
github.com/rancher/k3s/pkg/daemons/executor.Embedded.ControllerManager.func1(0xc006f2ecc0, 0xc0081dcb00)
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/embed.go:79 +0x46
created by github.com/rancher/k3s/pkg/daemons/executor.Embedded.ControllerManager
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/embed.go:77 +0x7f
	```

The text was updated successfully, but these errors were encountered:

StrongMonkey · 2021-07-26T20:47:40Z

@kinarashah Is this single docker install or HA install?

StrongMonkey · 2021-07-26T20:53:43Z

Looks like single docker install. In my view this is not release blocker bug. What could happen in here is that when upgrading rancher, the rancher process was sent with SIGTERM, context are canceled and causing leader election to fail and have above logs. As long as it came back normal there is no problem with it.

kinarashah · 2021-07-26T21:11:34Z

@StrongMonkey Yep, it was single node install. I agree this is not a release blocker because it restarts. @sowmyav27 suggested to still open this issue so we could keep track of it.

WMP · 2023-01-19T13:50:53Z

I have the same problem on v1.19.16-rancher1-5, v1.19.16-rancher2-1 without rancher panel (cattle-system), pure k8s installed by rke on 3 nodes etcd,controlplane and 13 workers. I did not notice this problem on version v1.18.20-rancher1-3 . Scheduler is running with default configuration. The error only affects schedulers that are leaders. Scheduler doesn't just crash during k8s update, but every 10 minutes after becoming a leader, during normal operation.
crash.log

sowmyav27 assigned sowmyav27 and unassigned sowmyav27 Jul 19, 2021

sowmyav27 added the kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement label Jul 19, 2021

sowmyav27 added this to the v2.6 milestone Jul 19, 2021

deniseschannon assigned StrongMonkey Jul 26, 2021

deniseschannon added [zube]: To Triage labels Jul 26, 2021

StrongMonkey added the QA/need-info label Jul 26, 2021

kinarashah removed the status/release-blocker label Jul 26, 2021

deniseschannon assigned ibuildthecloud and unassigned StrongMonkey Jul 28, 2021

deniseschannon added area/rke2 RKE2-related Issues [zube]: Next Up and removed [zube]: To Triage labels Jul 29, 2021

ibuildthecloud added [zube]: Working and removed [zube]: Next Up labels Jul 30, 2021

deniseschannon assigned kinarashah and unassigned ibuildthecloud Aug 2, 2021

deniseschannon modified the milestones: v2.6, v2.6.1 Aug 5, 2021

deniseschannon changed the title ~~Rancher server upgrade to master-head panics~~ Upgrade of a Docker installation causes a panic, but restarts and is functional after restart Aug 5, 2021

deniseschannon added area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework and removed area/rke2 RKE2-related Issues labels Aug 11, 2021

deniseschannon assigned ibuildthecloud and unassigned kinarashah Aug 19, 2021

deniseschannon modified the milestones: v2.6.1, v2.6.2 Sep 1, 2021

Jono-SUSE-Rancher modified the milestones: v2.6.2, v2.6.3 Oct 18, 2021

deniseschannon removed this from the v2.6.3 milestone Nov 18, 2021

deniseschannon unassigned ibuildthecloud Nov 18, 2021

deniseschannon added [zube]: To Triage and removed [zube]: Next Up labels Nov 18, 2021

zube bot added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 21, 2022

bpedersen2 mentioned this issue Jun 6, 2023

[BUG] Observing a panic on AKS clusters on a rancher server upgrade #41732

Closed

Jono-SUSE-Rancher removed the [zube]: To Triage label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

kinarashah commented Jul 19, 2021

StrongMonkey commented Jul 26, 2021

StrongMonkey commented Jul 26, 2021

kinarashah commented Jul 26, 2021

WMP commented Jan 19, 2023

Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

Comments

kinarashah commented Jul 19, 2021

StrongMonkey commented Jul 26, 2021

StrongMonkey commented Jul 26, 2021

kinarashah commented Jul 26, 2021

WMP commented Jan 19, 2023