Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set webhooks failure policy to Ignore #69

Merged

Conversation

phoracek
Copy link
Member

We are still facing many issues with inaccessible webhook blocking
creation of VMs and especially pods. Let's make our failure policy
ignore for now and investigate mentioned issues only in test
environments for now.

@dankenigsberg
Copy link
Contributor

I think it would be useful to state in the commit message the recently reported mode of failure: macpool prevented the upgrade of OpenShiftSDN. starting the SDN Pod requires the macpool, but the macpool uses SDN addresses, which are unavailable when the SDN is down.

I suspect we have to add a test for SDN recovery when macpool is enabled.

@SchSeba
Copy link
Collaborator

SchSeba commented Sep 17, 2019

can we change the PR to ignore only on the pods?

@dankenigsberg
Copy link
Contributor

can we change the PR to ignore only on the pods?

I clearly recall a promise: "on the next blocker bug we go to Ignore". It was assertive with no questions ;-)

I'm all for testing granular changes, and solving all bugs, but I think that we have waited enough. I vote for Ignore everywhere - but we should file an Issue to track a proper solution.

@phoracek
Copy link
Member Author

@dankenigsberg commit message updated, issue opened #70

@phoracek
Copy link
Member Author

@dankenigsberg @SchSeba lot of our tests expect failure to happen. We are blocking not only cases where Kubernetes fails to reach the webhook, but also cases where the validation simply fails.

@dankenigsberg
Copy link
Contributor

Oh, that's bad. I suspect there is no IgnoreOnlyIfNonresponsive.

I don't see a way around this. Currently we can:

  • disable macpool
  • ignore VM&Pod failures and skip negative tests
  • ignore only Pod failures and skip some tests

Earlier I voted against the third option, but it does allow us to save SOME face. Can we reconsider it?

Either

@phoracek
Copy link
Member Author

I will go over the test we'd need to skip tomorrow to see if it makes sense.

@SchSeba
Copy link
Collaborator

SchSeba commented Sep 18, 2019

+1 on the "ignore only Pod failures and skip some tests"

@dankenigsberg
Copy link
Contributor

@SchSeba would it be possible register the webhook so that it ignores openshift-* namespaces?

@phoracek
Copy link
Member Author

@dankenigsberg we looked into it. You can (de-)select Namespaces only based on their labels, not names. And AFAIK, openshift-* namespaces don't have any specific label. Worth double-checking though.

@phoracek
Copy link
Member Author

@dankenigsberg we can list all namespaces we want to ignore. However, I think we should just set ignore (and revert shortly after release).

@SchSeba
Copy link
Collaborator

SchSeba commented Sep 18, 2019

@dankenigsberg we looked into it. You can (de-)select Namespaces only based on their labels, not names. And AFAIK, openshift-* namespaces don't have any specific label. Worth double-checking though.

We can't do it.

For now we can only specify a label on a namespace that if it exist we skip or we enforce.
But not on the namespace name.

@SchSeba
Copy link
Collaborator

SchSeba commented Sep 18, 2019

@dankenigsberg we can list all namespaces we want to ignore. However, I think we should just set ignore (and revert shortly after release).

We can run on every start on all the namespaces check for openshift-* and update the namespace with a label but we can't control if someone (operator) remove or update that namespace labels

@dankenigsberg
Copy link
Contributor

However, I think we should just set ignore (and revert shortly after release).

Go go go, then.

@phoracek
Copy link
Member Author

@dankenigsberg @SchSeba please review

@@ -252,6 +252,8 @@ var _ = Describe("Virtual Machines", func() {
//2165
Context("when trying to create a VM after a MAC address has just been released duo to a VM deletion", func() {
It("should re-use the released MAC address for the creation of the new VM and not return an error", func() {
Skip("Failures on Pod reconciliation are merely ignored https://github.com/K8sNetworkPlumbingWG/kubemacpool/pull/69")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for VM why we need to skip it if we change the polocy for pods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one that failed with Ignore policy on pod.

We are still facing many issues with inaccessible webhook blocking
creation of VMs and especially pods. Let's make our failure policy
ignore for now and investigate mentioned issues only in test
environments for now.

One of these cases happened recently. Update of OpenShiftSDN prevented
KubeMacPool from reconciling and ended up in a dead-lock, where SDN
pods were trying to reach out to KubeMacPool disconnected from the
network.

Signed-off-by: Petr Horacek <phoracek@redhat.com>
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

the error is not related to this PR

@SchSeba SchSeba merged commit 61ce121 into k8snetworkplumbingwg:master Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants