Skip to content

Commit

Permalink
enhancements/security/openshift-image-policy: Propose new enhancement
Browse files Browse the repository at this point in the history
Increasing the security of OpenShift release images from the current
"check GPG signatures before initiating an update" to "check Sigstore
signatures on every Pod launch".

Like the (Cluster)ImagePolicy enhancement I'm building on, this
enhancement mostly belongs to the Node component.  But the node
component doesn't seem to have its own subdirectory, so I'm dropping
this into the root enhancements directory for now.
  • Loading branch information
wking committed May 29, 2024
1 parent 57dc5f6 commit b0f05e6
Showing 1 changed file with 278 additions and 0 deletions.
278 changes: 278 additions & 0 deletions enhancements/security/openshift-image-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
---
title: openshift-image-policy
authors:
- "@wking"
reviewers:
- "@jupierce, updates staff engineer"
- "@mrunalp, node staff engineer"
- "@zaneb, installer staff engineer"
approvers:
- "@mrunalp, node staff engineer"
api-approvers:
- "@JoelSpeed"
creation-date: 2024-05-29
last-updated: 2024-05-29
tracking-link:
- https://issues.redhat.com/browse/OCPSTRAT-1245
see-also:
- "/enhancements/api-review/add-ClusterImagePolicy-and-ImagePolicy-for-Signature-Verification.md"
---

# OpenShift Image Policy

## Summary

[An existing enhancement](/enhancements/api-review/add-ClusterImagePolicy-and-ImagePolicy-for-Signature-Verification.md) is adding ClusterImagePolicy and ImagePolicy primatives to config image policy.
This enhancement describes how those primatives will be used by OpenShift clusters to secure core workloads:

* The `quay.io/openshift-release-dev/ocp-release` scope, used for OpenShift release images, will require Sigstore signature signed by Red Hat, for any namespace in the cluster.
* Users may add additional policies if they want, to add additional requirements as they see fit, including for the `quay.io/openshift-release-dev` scope.

## Motivation

As an OpenShift user, I want to verify the release image using [Sigstore](https://docs.sigstore.dev/about/overview/), so that I can verify those images come from Red Hat.

### User Stories

* As a cluster administrator, I want to be more confident that the OpenShift release image was signed by Red Hat, checking whenever a pod is launched from such an image, and not just at update-time.
* As a cluster administrator, I want to be able to declare my own additional image policies, including restricting the use of the `quay.io/openshift-release-dev` scope.
* As a Red Hat CI job maintainer or quality engineer, I have a way to disable Sigstore verification while testing releases before they have been signed.

### Goals

* Define an initial OpenShift ClusterImagePolicy that will be installed in all 4.17 clusters to restrict `quay.io/openshift-release-dev/ocp-release` to Red-Hat-signed images.

### Non-Goals

There are a number of possible next-steps that I'm punting on for now, as I try to nibble off the smallest possible slice of progress.
This enhancement will not:

* Figure out how to configure a ClusterImagePolicy for verifying a mirrored release image, although this will need to happen before we can use Sigstore signatures in disconnected/restricted-network environments.
* Suggest changes to mirroring tools, although they will need to learn to mirror Sigstore signatures for use in disconnected/restricted-network environments.
* Suggest changes to the ImageStream controller, although it will need to learn to mirror Sigstore signatures for verification when running pods from ImageStream sources, e.g. [must-gather][].
* Suggest policies for `quay.io/openshift-release-dev/ocp-v4.0-art-dev`, although those would help tighten down security for other `openshift-*` namespaces.
* Support testing unsigned releases as HyperShift hosted clusters on a management cluster that requires signed releases.

## Proposal

This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images.
The new ClusterImagePolicy will be managed by the ClusterVersion operator and applied to all clusters by default (initially tech-preview, gated by the `ImagePolicy` feature gate ([currently `TechPreviewNoUpgrade`][feature-gate]).

Internal test clusters, which need to run releases that are not signed by official Red Hat keys, will us a new install-config property `internalTestingImagePolicy: Disable` (also gated by the `ImagePolicy` feature gate).
When set `Disable`, [the installer's ClusterVersion `overrides` handling][installer-overrides] will include an entry for the new `openshift` ClusterImagePolicy.
When [ClusterVersion `overrides`](/dev-guide/cluster-version-operator/dev/clusterversion.md#setting-objects-unmanaged) is set from cluster-installation like this, the cluster-version operator will never create the resource, so there is no need for further tooling to delete it.
Having `overrides` set blocks the cluster-version operator from updating, but because all internal updates between unsigned releases use `force`, that is not a problem for that use-case.

### Workflow Description

There is no supported workflow for this enhancement; when the relevant feature gate is enabled, this functionality works transparently without cluster administrator involvement.

For Red-Hat-internal CI and QE, the agent configuring the install-config needs to set `internalTestingImagePolicy: Disable`.
FIXME: what does the installer do these days about unrecognized properties in install-config when deserializing YAML?
Can we set this regardless of whether the `ImagePolicy` feature gate is enabled, or do we need to find some way to make that conditional?

### API Extensions

As part of implementing this policy, we will drop the following Godocs from [ClusterImagePolicy][ClusterImagePolicy-scope-restriction] and [ImagePolicy-scope-restriction]:

```golang
// Please be aware that the scopes should not be nested under the repositories of OpenShift Container Platform images.
// If configured, the policies for OpenShift Container Platform repositories will not be in effect.
```

[The backing implementation][container-runtime-scope-restriction] will also be removed.
This guard was [initially intended][scope-restriction-motivation] to protect cluster adminstrators from breaking their clusters by configuring policies that blocked critical images.
And before Red Hat was publishing signatures for `quay.io/openshift-release-dev/ocp-release` releases, that made sense.
But now that Red Hat is [almost][OTA-1267] publishing Sigstore signatures for those release images, it makes sense to allow policies covering those images.
And even if a cluster administrator creates a policy that blocks critical image pulls, PodDisriptionBudgets should keep the Kubernetes API server and related core workloads running for long enough for the cluster administrator to use the Kube API to remove or adjust the problematic policy.

### Topology Considerations

#### Hypershift / Hosted Control Planes

HyperShift clusters run pods [on the management cluster and in the hosted cluster][hypershift-concepts].
A ClusterImagePolicy applied to the hosted Kubernetes API server will only be distributed to hosted nodes, and none of those nodes are likely to run an image from `quay.io/openshift-release-dev/ocp-release`.
However, a ClusterImagePolicy applied to the management cluster's Kubernetes API server would cover both the management cluster's cluster-version operator (CVO), and the CVOs associated with hosted clusters (which run as pods in mangement-cluster namespaces).
That means that it would take an ImagePolicy to be able to distinguish between constraints on the management cluster's CVO and constraints on the hosted components running on the management cluster, which is out of scope for this enhancement.

#### Standalone Clusters

This proposal is aimed at standalone clusters.

#### Single-node Deployments or MicroShift

This proposal is aimed at single-node OpenShift deployments (SNOs), and will function there the same way it does in standalone clusters.
It is not expected to affect resource consumption, because pod-launches are expected to be a relatively insignificant fraction of overall pod lifetime, and the `quay.io/openshift-release-dev/ocp-release` scope check covered by this proposal is expected to cheaply exclude most pods.

This proposal is not relevant for MicroShift, which does not run a cluster-version operator or machine-config operator, and which handles trust via RPM signing.

### Implementation Details/Notes/Constraints

#### Machine-config operator

As discussed in [the *API Extensions* section](#api-extensions), the scope guard around OCP images will be dropped.

The machine-config operator will also grow a new release manifest, which will be reconciled by the cluster-version operator:

```yaml
kind: ClusterImagePolicy
metadata:
name: openshift
annotations:
kubernetes.io/description: Require Red Hat signatures for quay.io/openshift-release-dev/ocp-release container images.
exclude.release.openshift.io/internal-openshift-hosted: "true"
include.release.openshift.io/self-managed-high-availability: "true"
release.openshift.io/feature-set: TechPreviewNoUpgrade
spec:
scopes:
- quay.io/openshift-release-dev/ocp-release
policy:
rootoftrust:
policyType: PublicKey
keydata: LS0tLS1CRUdJTiBQVUJMSUMgS0VZLS0tLS0KTUlJQ0lqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FnOEFNSUlDQ2dLQ0FnRUEzQzJlVGdJQUo3aGxveDdDSCtIcE1qdDEvbW5lYXcyejlHdE9NUmlSaEgya09ZalRadGVLSEtnWUJHcGViajRBcUpWYnVRaWJYZTZKYVFHQUFER0VOZXozTldsVXpCby9FUUEwaXJDRnN6dlhVbTE2cWFZMG8zOUZpbWpsVVovaG1VNVljSHhxMzR2OTh4bGtRbUVxekowR0VJMzNtWTFMbWFEM3ZhYmd3WWcwb3lzSTk1Z1V1Tk81TmdZUHA4WDREaFNoSmtyVEl5dDJLTEhYWW5BMExzOEJlbG9PWVJlTnJhZmxKRHNzaE5VRFh4MDJhQVZSd2RjMXhJUDArRTlZaTY1ZE4zKzlReVhEOUZ6K3MrTDNjZzh3bDdZd3ZZb1Z2NDhndklmTHlJbjJUaHY2Uzk2R0V6bXBoazRjWDBIeitnUkdocWpyajU4U2hSZzlteitrcnVhR0VuVGcyS3BWR0gzd3I4Z09UdUFZMmtqMnY1YWhnZWt4V1pFN05vazNiNTBKNEpnYXlpSnVSL2R0cmFQMWVMMjlFMG52akdsMXptUXlGNlZnNGdIVXYwaktrcnJ2QUQ4c1dNY2NBS00zbXNXU01uRVpOTnljTTRITlNobGNReG5xU1lFSXR6MGZjajdYamtKbnAxME51Z2lVWlNLeVNXOHc0R3hTaFNraGRGbzByRDlkVElRZkJoeS91ZHRQWUkrK2VoK243QTV2UVV4Wk5BTmZqOUhRbC81Z3lFbFV6TTJOekJ2RHpHellSNVdVZEVEaDlJQ1I4ZlFpMVIxNUtZU0h2Tlc3RW5ucDdZT2d5dmtoSkdwRU5PQkF3c1pLMUhhMkJZYXZMMk05NDJzSkhxOUQ1eEsrZyszQU81eXp6V2NqaUFDMWU4RURPcUVpY01Ud05LOENBd0VBQVE9PQotLS0tLUVORCBQVUJMSUMgS0VZLS0tLS0K
```
#### Installer
As discussed in [the *Proposal* section](#proposal), the installer will grow a new, feature-gated `internalTestingImagePolicy` property driving ClusterVersion `overrides` extension.
When set to `Disable`, the following `overrides` entry will be added to the ClusterVersion manifest:

```yaml
kind: ClusterImagePolicy
group: config.openshift.io
name: openshift
unmanaged: true
```

##### Installer callers

Hive, the assisted installer, and other actors who create install-configs will need to learn to set `internalTestingImagePolicy: Disabled` if they wish to test unsigned releases that enable the feature-gate.

### Risks and Mitigations

Not much risk for connected clusters.
If we decide that the policy is not helpful, we can remove from future releases via a [deletion annotation](/enhancements/update/object-removal-manifest-annotation.md).

There is a transitional cost to the additional step (`internalTestingImagePolicy` for installs, `overrides` for updating older clusters) needed to run an unsigned release image for internal testing.
But that transitional cost is only expected to affect internal use-cases; external users should always be using signed releases.

Disconnected/restricted-network use is out of scope for this enhancement, so external clusters running `TechPreviewNoUpgrade` and getting this feature will also need the `internalTestingImagePolicy` or `overrides` workaround.
I expect they'll need:

```yaml
signedIdentity:
matchpolicy: RemapIdentity
...
```

in their image policy, but they'll also need some kind of config knob to give them access to that property, maybe also in the install-config.
Or maybe something already in the install-config could populate that portion of the image policy.
In any case, it's out of scope for a reason, but until it gets picked up, it's going to make things more difficult for `TechPreviewNoUpgrade` disconnected/restricted-network clusters.

How will security be reviewed and by whom?

I have no idea; suggestions welcome :).

There will be a gap during install before the ClusterImagePolicy has been created and rolled out to nodes, during which unsigned `quay.io/openshift-release-dev/ocp-release` images might run.
But we do not perform any in-cluster signature checking during installs today; they are secured by a trusted installer using a baked-in release image digest.
Trust in the installer is either transitive via trust in the hosting site from which the installer was downloaded, or via mirror signatures like [this one][4.15.0-multi-mirror-signature].

How will UX be reviewed and by whom?

I expect "images are failing to meet the relevant image policy/policies" to be something with a solid user experience, regardless of whether OpenShift defines its own image policy, or leaves image policies entirely up to users.

### Drawbacks

We're building a new thing here without a lot of experience defining ClusterImagePolicies.
And not any for the `quay.io/openshift-release-dev/ocp-release` scope, because those don't have Sigstore signatures [yet][OTA-1267].
Designing core-image verification without much experience is risky.
But passing around GPG signatures in non-registry storage for disconnected/restricted-networks is awkward, so doing nothing has risks too.

Building something that is partly designed to make mirroring and disconnected/restricted-network use easier without understanding what the disconnected/restricted-network solution will look like is also risky.
We could mitigate that with more clarity on what the disconnected/restricted-network solution would look like.

## Test Plan

Testing release signature validation is a gap in our current process, because we don't have triggers to run CI jobs after they've been signed _and_ a later release that we can update to has also been signed.
We could fill this in with a periodic, but if we test once A and B have both been signed, it's likely that A has already been shipped, and it's too late to catch and fix any bugs the tests turn up in A.
We could close this gap by signing CI releases with a CI key, [as envisioned years ago for GPG signatures][ci-signing-key], but so far never implemented.
It's unclear if this gap is considered severe enough to be worth that kind of investment, when occasional manual testing has done a fairly good job at keeping bugs out of slow-changing code (although we have had at least [one bug][verified-vs-graceful-shutdown] with signature handling that CI might have caught).

## Graduation Criteria

This functionality is expected to use [the existing `ImagePolicy` feature gate][feature-gate], and have the same graduation criteria.

### Dev Preview -> Tech Preview

[The existing `ImagePolicy` feature gate is already `TechPreviewNoUpgrade`][feature-gate], so this transition will never happen for this new functionality.

### Tech Preview -> GA

This functionality is expected to use [the existing `ImagePolicy` feature gate][feature-gate], and have the same graduation criteria.

### Removing a deprecated feature

This proposal does not deprecate any features.
This proposal does not remove any features, with the possible exception of the removal of the OCP image scope guard discussed in [the *API Extensions* section](#api-extensions).

## Upgrade / Downgrade Strategy

Existing clusters can return to the previous behavior after updating by setting the `overrides` entry mentioned in [the *Installer* section](#installer) and removing the policy with:

```console
$ oc delete clusterimagepolicy openshift
```

## Version Skew Strategy

The new ClusterImagePolicy manifest should have a late enough run-level that the cluster-version operator (CVO) applies it after the ClusterImagePolicy CRD and any machine-config tooling needed to process the custom resource.
Because the machine-config is at run-level 80, late in the update, this means that updates into new image policies will be applied after most other components have rolled out their new images, which may require slower ratcheting for things like signing key rotations.

## Operational Aspects of API Extensions

None of the changes proposed in [the *API Extensions* section](#api-extensions) have operational impact.

## Support Procedures

I expect "images are failing to meet the relevant image policy/policies" to be something with a solid support experience, regardless of whether OpenShift defines its own image policy, or leaves image policies entirely up to users.

The process for disabling the image policy is described in [the *Upgrade / Downgrade Strategy* section](#upgrade--downgrade-strategy).

## Alternatives

### Continue with GPG signatures

FIXME

### Have the cluster-version operator check Sigstore signatures directly

FIXME

### Set a broader policy for more Red Hat images

FIXME

### Use an ImagePolicy for `openshift-*` namespaces

FIXME

### Wait for a Red Hat Rekor

FIXME

### Have the machine-config operator manage the policy directly

FIXME

[4.15.0-multi-mirror-signature]: https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.15.0/sha256sum.txt.gpg
[ci-signing-key]: https://github.com/openshift/cluster-update-keys/blame/c0641e083d1bbe7d6f78e7e0c98c6d56a8b89d24/manifests/0000_90_cluster-update-keys_configmap.yaml#L4-L5
[ClusterImagePolicy-scope-restriction]: https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_cluster_image_policy.go#L41-L42
[container-runtime-scope-restriction]: https://github.com/openshift/machine-config-operator/blob/4b809a4214f/pkg/controller/container-runtime-config/container_runtime_config_controller.go#L1053-L1057
[feature-gate]: https://github.com/openshift/api/blame/ba11c1587003dc84cb014fd8db3fa597a3faaa63/features/features.go#L450-L455
[hypershift-concepts]: https://hypershift-docs.netlify.app/reference/concepts-and-personas/#concepts
[ImagePolicy-scope-restriction]: https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_image_policy.go#L40-L41
[installer-overrides]: https://github.com/openshift/installer/blob/d48a31ffb568f6a0c94d050af0d2f91652e561bc/pkg/asset/ignition/bootstrap/cvoignore.go#L96-L105
[must-gather]: https://github.com/openshift/oc/blob/0d1b31450255dca5d1bf1761b5594c2f050868af/pkg/cli/admin/mustgather/mustgather.go#L228-L274
[OTA-1267]: https://issues.redhat.com/browse/OTA-1267
[scope-restriction-motivation]: https://github.com/openshift/enhancements/pull/1402#discussion_r1223543692
[verified-vs-graceful-shutdown]: https://github.com/openshift/cluster-version-operator/pull/501

0 comments on commit b0f05e6

Please sign in to comment.