Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oc adm release new: add check for the existence of profile annotations in release manifests #663

Closed
wants to merge 1 commit into from

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Dec 3, 2020

With the introduction of cluster profiles, only manifests with a cluster profile annotation will be recognized/applied by the cluster version operator.

It should not be possible to create a release with manifests that are not annotated for at least one profile. This PR introduces a check for this.

Related enhancement:
openshift/enhancements#200

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: csrwng
To complete the pull request process, please assign abhinavdahiya after the PR has been reviewed.
You can assign the PR to them by writing /assign @abhinavdahiya in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}
}
if !hasProfileAnnotation {
return fmt.Errorf("%s: manifests must contain at least one profile annotation set", filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... must set at least one profile annotation or ... must have at least one profile annotation set or some such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated error messages

@wking
Copy link
Member

wking commented Dec 3, 2020

images:

error: unable to create a release: cluster-monitoring-operator/0000_50_cluster-monitoring-operator_00_0user-priority-class.yaml: manifests must contain Kubernetes API objects with 'metadata.annotations' set

@wking
Copy link
Member

wking commented Dec 3, 2020

That manifest is indeed in need of a profile declaration.

@csrwng
Copy link
Contributor Author

csrwng commented Dec 3, 2020

That manifest is indeed in need of a profile declaration.

There's a pr for that :) openshift/cluster-monitoring-operator#998

@csrwng
Copy link
Contributor Author

csrwng commented Dec 4, 2020

/test images

1 similar comment
@wking
Copy link
Member

wking commented Dec 15, 2020

/test images

@openshift-merge-robot
Copy link
Contributor

@csrwng: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-serial f698b36 link /test e2e-aws-serial
ci/prow/e2e-aws f698b36 link /test e2e-aws
ci/prow/e2e-agnostic-cmd f698b36 link /test e2e-agnostic-cmd
ci/prow/e2e-aws-upgrade f698b36 link /test e2e-aws-upgrade
ci/prow/images f698b36 link /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member

wking commented Jan 12, 2021

/test images

@sjenning
Copy link
Contributor

openshift/cluster-kube-scheduler-operator#319 should fix the CI failure

@wking
Copy link
Member

wking commented Jan 12, 2021

Before actually landing this, we probably need either a flag to turn the check off, or to wait a minor or two for the existing labels to sink in. If we land this as it stands right now, someone using a master oc would be unable to build 4.6 and earlier releases. Maybe that's ok, and everyone building releases is using the oc that they're packaging into the release? Probably worth checking with the test-platform, release-controller, and ART folks if we want to lean on that.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 13, 2021

This would break backwards compatibility between new cli and old releases. Why is absence of annotation not implied to mean "all profiles" in the CVO?

@smarterclayton
Copy link
Contributor

/hold

I have some concerns here. CLI tooling currently works on all releases of openshift. This changes that behavior. I agree the use case of verifying that users specify appropriate profiles is important. I would probably focus on adding a parameter which is an allowed set of profiles and if that is set, we reject typo'd profiles. However, you won't be able to add that flag to all invocations of the release tool without changing release controller and ci-operator slightly. I need to see discussion of how you'd roll that flag out there.

The remaining question is whether no profile = all or none. I don't find the argument in the enhancement compelling (and certainly doesn't require that to prevent typos). I would probably argue that no profile = all, and we protect against typos with the allowed profiles flag in merge PRs.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@smarterclayton
Copy link
Contributor

The core problem is that the enhancement change is not backwards compatible to the API of release construction, and I don't agree we should break that API.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2021

@smarterclayton are you referring to openshift/enhancements#200 (comment) ?
/cc @deads2k

@smarterclayton
Copy link
Contributor

Yes. However, what problem is THIS change solving? Someone accidentally omitted a profile?

@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2021

Yes that's the problem it's solving. Happy to condition it on a flag.

@smarterclayton
Copy link
Contributor

Without a plan for how you roll that flag out to the CI infrastructure, the flag won't do much good. If we hadn't broken forwards behavior compatibilty on the API by requiring the profile annotation, you could have hardcoded the list of allowed profiles in oc adm release new (because old release would be the same as the new).

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 13, 2021

A more effective guard (at least, more effective in being more localized and more concrete) for this would be to have the CVO refuse to start if any manifest contains ANY unspecified profile. That restricts the problem to static up front AND preserves compatibility of client tooling, and would have gated the CVO change from merging (because of the bug in scheduler operator). While I like detecting this in tooling, we should detect this where the object is silently ignored as well. We can then come back and decide whether we change the default behavior in CVO and release both to preserve compatibility of the payload API.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2021

I like that ... @wking ^^

@wking
Copy link
Member

wking commented Jan 14, 2021

openshift/cluster-kube-scheduler-operator#319 landed.

/retest

A more effective guard (at least, more effective in being more localized and more concrete) for this would be to have the CVO refuse to start if any manifest contains ANY unspecified profile. That restricts the problem to static up front...

But when you forget to label a manifest with a profile, or you typo the profile, or whatever, the result would be that the bootstrap CVO crashloops, install hangs waiting for bootstrap complete, and folks complain to the installer? Unless I can finally talk folks into landing automatic installer analysis of gathered log bundles (openshift/installer#2567) and find some non-brittle way to highlight the CVO issue there? I like guarding against this at release-build time, because it's a lot easier to report quickly to the person who called oc adm release new ..., without having to bubble the error through multiple hosts and layers of code.

While I like detecting this in tooling, we should detect this where the object is silently ignored as well.

Which object is silently ignored? Because neither this oc approach nor a "CVO-dies-on-surprising-manifest" approach would be silently ignoring anything, would they?

The remaining question is whether no profile = all or none.

I like the fact that the current "none" interpretation makes it easy to distinguish folks who have thought about it and decided that their manifest belongs in all profiles from folks who have not heard about profiles and may not have considered whether their manifest belongs in all of them. Of course, even the current schema does not record whether folks have considered each manifest vs. some new profile and then rejected that profile for that manifest, so maybe "I have thought about profiles" is too coarse to be worth recording.

@smarterclayton
Copy link
Contributor

But when you forget to label a manifest with a profile, or you typo the profile, or whatever, the result would be that the bootstrap CVO crashloops, install hangs waiting for bootstrap complete, and folks complain to the installer?

You wouldn't be able to merge your code change to any of the payload repos, because you wouldn't pass e2e-aws

@smarterclayton
Copy link
Contributor

I like the fact that the current "none" interpretation makes it easy to distinguish folks who have thought about it and decided that their manifest belongs in all profiles from folks who have not heard about profiles and may not have considered whether their manifest belongs in all of them. Of course, even the current schema does not record whether folks have considered each manifest vs. some new profile and then rejected that profile for that manifest, so maybe "I have thought about profiles" is too coarse to be worth recording.

The current approach buries the signal in the noise. The signal is "this is different because it has the annotation". Right now you have to review everything to find that.

I think the only way we can have adm release new validate profiles is to allow "" as "any", including in the CVO, and then validate one in oc adm release new the allowed set of profiles as "no annotation is valid", or "1..N annotations with known profile strings are valid". The release tooling then continues to work with all versions of openshift, and the noise around what is the delta is cleaned up, and we only allow a certain set of profiles at release creation time.

@wking
Copy link
Member

wking commented Feb 19, 2021

/test images

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2021

@csrwng: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-cmd f698b36 link /test e2e-agnostic-cmd
ci/prow/e2e-metal-ipi-ovn-ipv6 f698b36 link /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws f698b36 link /test e2e-aws
ci/prow/e2e-aws-serial f698b36 link /test e2e-aws-serial
ci/prow/e2e-aws-upgrade f698b36 link /test e2e-aws-upgrade
ci/prow/images f698b36 link /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021
@csrwng
Copy link
Contributor Author

csrwng commented May 24, 2021

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2021

@csrwng: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants