Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade of a Docker installation causes a panic, but restarts and is functional after restart #33685

Open
kinarashah opened this issue Jul 19, 2021 · 4 comments
Labels
area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement QA/need-info team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support

Comments

@kinarashah
Copy link
Member

Upgraded rancher server from v2.5.9 to master-head 2f7d673b5193. Saw the panic below in logs, container restarted and was fine later.

E0719 19:15:33.421340      36 leaderelection.go:361] Failed to update lock: Put "https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s": context deadline exceeded
I0719 19:15:33.497822      36 event.go:291] "Event occurred" object="kube-system/kube-controller-manager" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="d07635975132_ada790ac-2121-4891-9365-bf5de306fdbf stopped leading"
I0719 19:15:33.497919      36 event.go:291] "Event occurred" object="kube-system/kube-controller-manager" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="d07635975132_ada790ac-2121-4891-9365-bf5de306fdbf stopped leading"
I0719 19:15:33.514108      36 leaderelection.go:278] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
F0719 19:15:33.518947      36 controllermanager.go:293] leaderelection lost
goroutine 2019 [running]:
github.com/rancher/k3s/vendor/k8s.io/klog/v2.stacks(0xc000128001, 0xc0236fe2a0, 0x4c, 0xde)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).output(0x6ef5aa0, 0xc000000003, 0x0, 0x0, 0xc01cfdcfc0, 0x6b8ad4a, 0x14, 0x125, 0x0)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:945 +0x191
github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).printf(0x6ef5aa0, 0x3, 0x0, 0x0, 0x4514b1d, 0x13, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:733 +0x17a
github.com/rancher/k3s/vendor/k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1456
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.Run.func2()
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:293 +0x73
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc00a3f7d40)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0x29
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc00a3f7d40, 0x4c77280, 0xc00f744300)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:210 +0x15d
github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection.RunOrDie(0x4c772c0, 0xc000124018, 0x4c979c0, 0xc00ed42220, 0x37e11d600, 0x2540be400, 0x77359400, 0xc00ed42200, 0x46ecc48, 0x0, ...)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:222 +0x9c
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.Run(0xc00f12c4f0, 0xc00010e660, 0xc006f0d350, 0xc006f34be8)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:285 +0x979
github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app.NewControllerManagerCommand.func2(0xc0081dcb00, 0xc00e5cbad0, 0x0, 0xd)
	/go/src/github.com/rancher/k3s/vendor/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:124 +0x2b7
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).execute(0xc0081dcb00, 0xc003494200, 0xd, 0x10, 0xc0081dcb00, 0xc003494200)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0081dcb00, 0x0, 0x0, 0xc002cf4340)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/rancher/k3s/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/rancher/k3s/vendor/github.com/spf13/cobra/command.go:887
github.com/rancher/k3s/pkg/daemons/executor.Embedded.ControllerManager.func1(0xc006f2ecc0, 0xc0081dcb00)
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/embed.go:79 +0x46
created by github.com/rancher/k3s/pkg/daemons/executor.Embedded.ControllerManager
	/go/src/github.com/rancher/k3s/pkg/daemons/executor/embed.go:77 +0x7f
	```
@sowmyav27 sowmyav27 assigned sowmyav27 and unassigned sowmyav27 Jul 19, 2021
@sowmyav27 sowmyav27 added the kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement label Jul 19, 2021
@sowmyav27 sowmyav27 added this to the v2.6 milestone Jul 19, 2021
@StrongMonkey
Copy link
Contributor

@kinarashah Is this single docker install or HA install?

@StrongMonkey
Copy link
Contributor

Looks like single docker install. In my view this is not release blocker bug. What could happen in here is that when upgrading rancher, the rancher process was sent with SIGTERM, context are canceled and causing leader election to fail and have above logs. As long as it came back normal there is no problem with it.

@kinarashah
Copy link
Member Author

@StrongMonkey Yep, it was single node install. I agree this is not a release blocker because it restarts. @sowmyav27 suggested to still open this issue so we could keep track of it.

@deniseschannon deniseschannon modified the milestones: v2.6, v2.6.1 Aug 5, 2021
@deniseschannon deniseschannon changed the title Rancher server upgrade to master-head panics Upgrade of a Docker installation causes a panic, but restarts and is functional after restart Aug 5, 2021
@deniseschannon deniseschannon added area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework and removed area/rke2 RKE2-related Issues labels Aug 11, 2021
@deniseschannon deniseschannon modified the milestones: v2.6.1, v2.6.2 Sep 1, 2021
@Jono-SUSE-Rancher Jono-SUSE-Rancher modified the milestones: v2.6.2, v2.6.3 Oct 18, 2021
@deniseschannon deniseschannon removed this from the v2.6.3 milestone Nov 18, 2021
@zube zube bot added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 21, 2022
@WMP
Copy link

WMP commented Jan 19, 2023

I have the same problem on v1.19.16-rancher1-5, v1.19.16-rancher2-1 without rancher panel (cattle-system), pure k8s installed by rke on 3 nodes etcd,controlplane and 13 workers. I did not notice this problem on version v1.18.20-rancher1-3 . Scheduler is running with default configuration. The error only affects schedulers that are leaders. Scheduler doesn't just crash during k8s update, but every 10 minutes after becoming a leader, during normal operation.
crash.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement QA/need-info team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
None yet
Development

No branches or pull requests

7 participants