Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-4991: Fix dynamic conversion webhook issue #2912

Merged
merged 1 commit into from
May 11, 2023

Conversation

tchughesiv
Copy link
Contributor

@tchughesiv tchughesiv commented Jan 4, 2023

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 4, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jan 4, 2023

Hi @tchughesiv. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tchughesiv tchughesiv changed the title fix dynamic conversion webhook issue [WIP] fix dynamic conversion webhook issue Jan 4, 2023
@tchughesiv tchughesiv force-pushed the fixConversion branch 2 times, most recently from 781357f to b4a96c1 Compare January 5, 2023 14:51
@tchughesiv tchughesiv changed the title [WIP] fix dynamic conversion webhook issue Fix dynamic conversion webhook issue Jan 5, 2023
@tchughesiv tchughesiv marked this pull request as ready for review January 5, 2023 17:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2023
@tchughesiv tchughesiv changed the title Fix dynamic conversion webhook issue OCPBUGS-4991: Fix dynamic conversion webhook issue Jan 5, 2023
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jan 5, 2023
@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

Oddly though, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]] conversion: strategy: Webhook webhook: clientConfig: service: namespace: openshift-dbaas-operator name: dbaas-operator-webhook-service path: /convert port: 443 conversionReviewVersions: - v1alpha1 - v1beta1

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]] conversion: strategy: Webhook webhook: clientConfig: service: namespace: redhat-dbaas-operator name: dbaas-operator-controller-manager-service path: /convert port: 443 caBundle: >- LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tchughesiv
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tchughesiv
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 5, 2023
@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kuiwang02

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from kuiwang02 January 5, 2023 17:38
@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kuiwang02

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]] conversion: strategy: Webhook webhook: clientConfig: service: namespace: openshift-dbaas-operator name: dbaas-operator-webhook-service path: /convert port: 443 conversionReviewVersions: - v1alpha1 - v1beta1

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]] conversion: strategy: Webhook webhook: clientConfig: service: namespace: redhat-dbaas-operator name: dbaas-operator-controller-manager-service path: /convert port: 443 caBundle: >- LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kuiwang02

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kuiwang02

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 
map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: This pull request references Jira Issue OCPBUGS-4991, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kuiwang02

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tchughesiv
Copy link
Contributor Author

it appears the failed tests are unrelated to these changes. the flaky one seems to be due to a known issue and minikube may just have to be re-run. how should we proceed?

@perdasilva
Copy link
Collaborator

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 28, 2023
@openshift-ci openshift-ci bot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 28, 2023
@perdasilva
Copy link
Collaborator

@tchughesiv I've rebased your PR from this side ^^

@perdasilva perdasilva added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2023
@openshift-ci
Copy link

openshift-ci bot commented Feb 28, 2023

New changes are detected. LGTM label has been removed.

@perdasilva
Copy link
Collaborator

@tchughesiv could I also ask you to add some tests?

@tchughesiv
Copy link
Contributor Author

@perdasilva sorry, been sidetracked w/ other work. I'll prioritize this work

@tchughesiv tchughesiv force-pushed the fixConversion branch 3 times, most recently from d2e49a4 to e831e88 Compare April 5, 2023 15:38
@tchughesiv
Copy link
Contributor Author

tchughesiv commented Apr 5, 2023

@perdasilva @asmacdo ready for review

@m1kola
Copy link
Member

m1kola commented Apr 14, 2023

I think changes look good, but I'm not adding lgtm just yet because I want to try and reproduce first. I tried to quickly reproduce today on a kind cluster and resolution failed. Looks like there is some dependency which I do not have in my catalog. I didn't have time to look deeper into it, but I'll come back to it on Monday.

@perdasilva perdasilva force-pushed the fixConversion branch 2 times, most recently from a8f873e to 7d8e306 Compare April 24, 2023 13:12
Signed-off-by: Tommy Hughes <tohughes@redhat.com>
@perdasilva
Copy link
Collaborator

ok, I've managed to verify the bug as per the description:

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]
map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJ2RENDQVdLZ0F3SUJBZ0lJUWI0U2Vaa2tGeW93Q2dZSUtvWkl6ajBFQXdJd1FqRVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqRW9NQ1lHQTFVRUF4TWZiMnh0TFhObGJHWnphV2R1WldRdE5ERmlaVEV5TnprNQpPVEkwTVRjeVlUQWVGdzB5TXpBMU1URXhNRFF6TlRsYUZ3MHlOVEExTVRBeE1EUXpOVGxhTUVJeEZqQVVCZ05WCkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0eEtEQW1CZ05WQkFNVEgyOXNiUzF6Wld4bWMybG5ibVZrTFRReFltVXgKTWpjNU9Ua3lOREUzTW1Fd1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DQUFSYUh1Yk1nUmpqSUJQWQpvQ21YTCt5OU9hMEZ5bW03a2tsTVFpbXVOb01EZ0VYanZSRERpRUYzK002blg2NkFwbDRBYXFHM2pNRXU4VHpCCjRGS2lYN2hwbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DQWdRd0R3WURWUjBUQVFIL0JBVXdBd0VCL3pBZEJnTlYKSFE0RUZnUVVMdnh1dWFKSnI0Y1NaV2ZhaHRKZmFxRk9NOWd3Q2dZSUtvWkl6ajBFQXdJRFNBQXdSUUloQUxIWQpQMmlTQk1JUDRsQ25idkVsc2UyQy8zNFdjUkpqZDB1eDM0KzNPVm1GQWlCWXZJUXZEWWp5SEdtZ0VFZzB5MnpDClpLbG5vUFhpV24rWDYzS3ROVmZWYnc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

trying to verify the fix

@perdasilva
Copy link
Collaborator

verified fix - lgtm!

Copy link
Collaborator

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - double checked the bug and the fix

@openshift-ci
Copy link

openshift-ci bot commented May 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva, tchughesiv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2023
@perdasilva perdasilva merged commit 858c7cd into operator-framework:master May 11, 2023
@openshift-ci-robot
Copy link
Collaborator

@tchughesiv: Jira Issue OCPBUGS-4991: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-4991 has been moved to the MODIFIED state.

In response to this:

Signed-off-by: Tommy Hughes tohughes@redhat.com

Description of the change:
With this change, relevant CRDs will properly retain dynamic conversion webhook settings in crd.spec.conversion.webhook.clientConfig.

Motivation for the change:
Currently, during a fresh install of an operator with conversion webhooks enabled, crd.spec.conversion.webhook.clientConfig is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

However, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected. This appears to be due to the differences in CSV status.phase progression.

Testing remarks:

Steps to Reproduce:

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

# Eventually, the clientConfig settings will revert to the following and stay that way.
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

Post-fix:

The crd.spec.conversion.webhook.clientConfig should instead retain the following settings.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' 

map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNxxxxxxxxxxxxx service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

Additional info:

Oddly enough, currently if the operator is installed as an upgrade... instead of a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

$ oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants