Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Fix HNC startup issues #1127

Merged
merged 1 commit into from
Sep 22, 2020
Merged

Conversation

adrianludwin
Copy link
Contributor

This essentially reverts #1087, which breaks HNC on new clusters that
haven't previously had HNC installed. It fixes the nondeterministic
crashing problem by patching in
kubernetes-sigs/controller-runtime#1155, which
has been applied to controller-runtime 0.6.3 in @adrianludwin's repo.
This is a temporary hack and will be removed when controller-runtime
releases its own fix - likely 0.6.4.

Tested: with the reversion of #1087 (main.go), HNC can be installed on a
fresh cluster again but fails to start up ~50% of the time. With the fix
to controller-runtime, it passes on 20/20 startup attempts. Ran e2e
tests and got the same result as without this change (four failures).

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 21, 2020
@adrianludwin
Copy link
Contributor Author

/assign @rjbez17
/assign @yiqigao217

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 21, 2020
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2020
This essentially reverts kubernetes-retired#1087, which breaks HNC on new clusters that
haven't previously had HNC installed. It fixes the nondeterministic
crashing problem by patching in
kubernetes-sigs/controller-runtime#1155, which
has been applied to controller-runtime 0.6.3 in adrianludwin's repo.
This is a temporary hack and will be removed when controller-runtime
releases its own fix - likely 0.6.4.

Tested: with the reversion of kubernetes-retired#1087 (main.go), HNC can be installed on a
fresh cluster again but fails to start up ~50% of the time. With the fix
to controller-runtime, it passes on 20/20 startup attempts. Ran e2e
tests and got the same result as without this change (four failures).
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 21, 2020
Copy link
Contributor

@yiqigao217 yiqigao217 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
/assign @rjbez17

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Sep 21, 2020
@rjbez17
Copy link

rjbez17 commented Sep 22, 2020

/lgtm
/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianludwin, rjbez17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adrianludwin,rjbez17]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f990be3 into kubernetes-retired:master Sep 22, 2020
@adrianludwin adrianludwin deleted the startup branch September 22, 2020 16:56
@adrianludwin adrianludwin added this to the hnc-v0.6 milestone Sep 23, 2020
yiqigao217 added a commit to yiqigao217/multi-tenancy that referenced this pull request Nov 24, 2020
kubernetes-retired#1127 is our
work-around to patch controller-runtime v0.6.3 to fix the
thread/start-safe bug. Now since the bug fix is in v0.6.4 (see
https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.6.4),
we can replace the patched version with v0.6.4.

Tested by make test.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants