Move webhook setup before manager start #1087

yiqigao217 · 2020-09-09T19:02:36Z

According to
kubernetes-sigs/controller-runtime#1148,
there's a race condition between manager start and webhook register.
This commit moved webhook setup before manager start.

Tested on GKE cluster. The nil pointer error was not detected for 20
times.

yiqigao217 · 2020-09-09T19:05:55Z

/assign @adrianludwin
/assign @rjbez17

adrianludwin

lgtm with minor nits. You don't need to test another 20 times after making my suggested changes, once will be fine :)

incubator/hnc/cmd/manager/main.go

According to kubernetes-sigs/controller-runtime#1148, there's a race condition between manager start and webhook register. This commit moved webhook setup before manager start. Tested on GKE cluster. The nil pointer error was not detected for 20 times.

yiqigao217 · 2020-09-09T19:19:59Z

lgtm with minor nits. You don't need to test another 20 times after making my suggested changes, once will be fine :)

Updated. BTW tested 20 times because last time it took about 20 times to reproduce the race condition.

adrianludwin

/lgtm
/approve

@rjbez17 this is a fairly straightforward fix, lmk if you have any concerns about it but I'm just approving for now.

k8s-ci-robot · 2020-09-09T19:20:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianludwin, yiqigao217

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adrianludwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yiqigao217 · 2020-09-09T19:24:39Z

/retest

looking the object reconciler issue now.

@adrianludwin

This essentially reverts kubernetes-retired#1087, which breaks HNC on new clusters that haven't previously had HNC installed. It fixes the nondeterministic crashing problem by patching in kubernetes-sigs/controller-runtime#1155, which has been applied to controller-runtime 0.6.3 in @adrianludwin's repo. This is a temporary hack and will be removed when controller-runtime releases its own fix - likely 0.6.4. Tested: with the reversion of kubernetes-retired#1087 (main.go), HNC can be installed on a fresh cluster again but fails to start up ~50% of the time. With the fix to controller-runtime, it passes on 20/20 startup attempts. Ran e2e tests and got the same result as without this change (four failures).

This essentially reverts kubernetes-retired#1087, which breaks HNC on new clusters that haven't previously had HNC installed. It fixes the nondeterministic crashing problem by patching in kubernetes-sigs/controller-runtime#1155, which has been applied to controller-runtime 0.6.3 in adrianludwin's repo. This is a temporary hack and will be removed when controller-runtime releases its own fix - likely 0.6.4. Tested: with the reversion of kubernetes-retired#1087 (main.go), HNC can be installed on a fresh cluster again but fails to start up ~50% of the time. With the fix to controller-runtime, it passes on 20/20 startup attempts. Ran e2e tests and got the same result as without this change (four failures).

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 9, 2020

k8s-ci-robot requested review from adrianludwin and Fei-Guo September 9, 2020 19:02

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 9, 2020

k8s-ci-robot assigned adrianludwin and rjbez17 Sep 9, 2020

adrianludwin reviewed Sep 9, 2020

View reviewed changes

incubator/hnc/cmd/manager/main.go Show resolved Hide resolved

incubator/hnc/cmd/manager/main.go Outdated Show resolved Hide resolved

incubator/hnc/cmd/manager/main.go Show resolved Hide resolved

yiqigao217 force-pushed the webhookrace branch from 3e4a0f2 to 619d6e0 Compare September 9, 2020 19:17

adrianludwin reviewed Sep 9, 2020

View reviewed changes

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2020

k8s-ci-robot merged commit ec7fbb5 into kubernetes-retired:master Sep 9, 2020

adrianludwin mentioned this pull request Sep 21, 2020

Fix HNC startup issues #1127

Merged

adrianludwin added this to the hnc-v0.6 milestone Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move webhook setup before manager start #1087

Move webhook setup before manager start #1087

yiqigao217 commented Sep 9, 2020

yiqigao217 commented Sep 9, 2020

adrianludwin left a comment

yiqigao217 commented Sep 9, 2020

adrianludwin left a comment

k8s-ci-robot commented Sep 9, 2020

yiqigao217 commented Sep 9, 2020

Move webhook setup before manager start #1087

Move webhook setup before manager start #1087

Conversation

yiqigao217 commented Sep 9, 2020

yiqigao217 commented Sep 9, 2020

adrianludwin left a comment

Choose a reason for hiding this comment

yiqigao217 commented Sep 9, 2020

adrianludwin left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 9, 2020

yiqigao217 commented Sep 9, 2020