Skip to content

Commit

Permalink
Update OKD proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
LorbusChris committed Dec 10, 2020
1 parent cdb4d70 commit 00ba40d
Showing 1 changed file with 32 additions and 102 deletions.
134 changes: 32 additions & 102 deletions enhancements/okd-on-fedora-coreos.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ reviewers:
- "@cgwalters"
approvers:
creation-date: 2019-10-21
last-updated: 2020-05-18
last-updated: 2020-12-10
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
see-also:
replaces:
Expand All @@ -32,128 +32,58 @@ superseded-by:
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift/docs]

## Open Questions [optional]

1. When is OCP migrating to Ignition config spec v3?

Please see the Graduation Criteria section for more context on both open questions.

## Summary

OKD is the OpenShift community distribution of Kubernetes.

As part of the version 4 effort, the OKD working group has agreed to target Fedora CoreOS (FCoS) as the primary base operating system for OKD.
As part of the version 4 effort, the OKD working group has decided to target Fedora CoreOS (FCOS) as the primary base operating system for OKD control plane and worker nodes.

This document contains a proposal to create all OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master and release branches alongside OCP artifacts.

Eventually, container images for all components will be shared between OKD and OCP, with the exception of the machine-os-content container image which will be based on Fedora CoreOS content, and any container images that are based on RHEL images in OCP. These will be rebuilt on either a UBI, Fedora or CentOS base for OKD.

A lot of this is already done. What's left is the un-forking of Machine Config Operator and (optionally) Installer repository and building variant-agnostic artifacts of them, as well as changing the Samples Operator to not go degraded on OKD clusters.
This document contains a proposal to add missing FCOS support to key components (`machine-config-operator` and `installer`),
create OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master and release branches alongside OCP CI artifacts,
and regularily create OKD release payloads from there.

## Motivation

Fedora CoreOS includes Ignition v2.x to use for the firstboot configuration of newly provisioned machines with Ignition configuration specification 3.x while RHEL CoreOS currently includes and uses Ignition v0.x supporting config spec 2.x.

Ignition configuration is a declarative format representing the desired on-disk state of the machine.

The installer uses Terraform to provision the bootstrap machine with a config encoded in a spec understood by the host system. After this initial provisioning of the bootstrap host, a Machine Config Server (MCS) is started on the bootsrap host to serve Ignition config to all nodes joining for the initial installation. Subsequently, the bootstrap host is destroyed and another instance of the MCS on the cluster continues to serve Ignition for additional nodes that want to join the cluster later.

Only a limited subset of the Ignition config fields is supported by the MCS (i.e. config that describes the on-disk state like files and systemd units).

In OCP this on-disk state representation used within the Installer and the MCS is currently encoded in Ignition spec v2.2 config types, while v3.0 config types are used in OKD. To account for the different spec types, OKD's Installer binary as well as OKD's Machine Config Operator (MCO) container image are currently built from forked codebases.

Long-term maintainance of those forks is not wanted. The MCO fork represents an OKD 4 GA release blocker. The installer fork, while not blocking GA, should be dissolved after OCP migrates to Ignition config spec v3.
The OKD Working Group wants to deliver a community variant of OpenShift on top of Fedora CoreOS\
that is released off of the latest OpenShift stable release branch in a rolling manner,
and provids the ability to update. An update release payload for OKD encapsulates a new release of the underlying operating system
as well as new releases of the cluster components.

### Goals

The goal is to create all OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master and release branches.
The goals are to achieve rolling OKD 4 Releases, based on Fedora CoreOS and the current OpenShift codebase,
as well as facilitation of community participation, adoption, feedback and knowledge sharing.

### Non-Goals

The migration of existing OCP clusters to Ignition Config Spec v3 is not part of this proposal, however the functionality proposed here will facilitate that in the future.
This enhancement does not concern itself with the releases of optional operators for OKD via OperatorHub.

## Proposal

There are four parts to this proposal: Changes concerning Prow, MCO, Samples Operator and Installer.

### 1: Setup Prow Jobs
The definitions of OKD build jobs for machine-os-content containers will be added to the the `openshift/release` repository to live side by side with the OCP job defintions.

This is mostly done.

### 2: Machine Config Operator Changes

These changes are requirements for OKD 4 GA.

#### Pulling Machine Config Daemon from release payload

Fedora CoreOS images do not contain the Machine Config Daemon (MCD).
In OKD, the MCD (which contains the `pivot` command) is pulled onto the nodes by extracting it from the Machine Config Operator image. Then it can proceed to pivot into the current OS version specified in the machine config.

The proposal is to align OCP with this, so the MCD is always pulled from image and is not needed in the RHEL CoreOS image compose.

One benefit of this is that version drift between MCD from the boot image and from the current machine-os-content is ruled out, as the MCD will always come from the MCO image from the current payload.

Note: MCO images are currently UBI7 based. For RHEL CoreOS it should be preferred to pull an MCD built inside a UBI8 container.

#### Dual Ignition config spec v2/v3 support in Machine Config Operator

The proposal is to add dual Ignition support to the MCO so that OKD does not require a forked version and can instead be built from the canonical branches in the machine-config-operator repository.

In order to achive this, the following will be done:

Firstly, the ability to parse and translate v3.x configs to v2.2 will be added. This way the MCO will be able to process v3.x config that is either provided by the installer during install, or by the cluster admin at a later point. The rendered machine config representing the current on-disk state internally will remain spec v2.2 for now.

Secondly, the Machine Config Server will gain the ability to inspect the User Agent HTTP header of the incoming config request to serve a spec version supported by the requesting system. For this, the canonical rendered machine config will be translated up to spec v3 if the User Agent shows an Ignition version larger than 2.0.0.

This is implemented by this PR: https://github.com/openshift/machine-config-operator/pull/1703

The last commit in that PR already shows that the spec used to store on-disk state internally (i.e. the rendered machine config) could already be switched over to v3 without any breaking changes in external APIs. This however is not required to enable using the same MCO images for OKD and OCP.

### 3: Sample Operators Changes

#### Mark RHEL based operator images as optional

The samples operator degrades on OKD due to the fact that not all images are pullable without a valid pull secret. The behaviour should change so it does not degrade on OKD clusters for users that have not specified a valid pull secret.

Alternatively, all RHEL based images this concers could be rebuilt on a CentOS or Fedora base, and a mechanism to pull them in case of no valid pull secret could be implemented in `cluster-samples-operator`.

### 4: Installer Changes

These changes are optional for the immediate goal of releasing OKD 4 GA. They should however be strongly considered in order to facilitate dropping the OKD fork of installer rather sooner than later.

#### Add fcos.json

Support for an `fcos.json` to exist alongside `rhcos.json` for OS image references.

This could be enabled either via a build-time flag to produce seperate OKD installer artifacts, or an `fcos` or `okd` run-time flag to pass to the installer for it to provision Fedora CoreOS instead of RHEL CoreOS hosts (in that case it would default to contents of `rhcos.json`).

#### Add option to disable pull secret to Installer

OKD installs do not require a pull secret, so make it optional.

This could be enabled either via a build-time flag to produce seperate OKD installer artifacts, or a run-time flag to signal to the installer to not require a pull secret.

#### Switching Installer to Ignition spec v3

The MachineConfig API vendored in the Installer has to be updated to the current version that uses a RawExtension to store the ignition config.

To facilitate the migration, all functions requiring Ignition types will be abstracted into one package. This is implemented by this PR: https://github.com/openshift/installer/pull/3248

With the MCO's ability to process both spec v2 and v3 configs, switching the installer over to spec v3 is technically possible.

It is imperative that the timing of this change is aligned with RHEL CoreOS images switching to an Ignition v2.x binary that supports spec v3.

Once this migration is complete, it will be possible for OKD to use an installer built off of the same sources.
This is a proposal to:
- Create all OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master branch and the latest release branch for testing,
and release OKD payloads built in Prow from the latest release branch regularly, e.g. every 3 weeks.
- Support Fedora CoreOS as base OS in key OpenShift componentes (`machine-config-operator` and `installer`) so OKD releases can be made without requiring forked codebases.
- Provide easy-to-access ways for community members to participate:
- by providing a repository with bug report templates
- by encouraging and directing community contributors to send patches and bug fixes directly to the upstream OpenShift codebase
- create documentation and share knowledge about OKD

### Graduation Criteria

Specifically, success of this proposal will be measured by the following:
- Fedora RPM based `machine-os-content` containers are built continuously, with promotion gating for each minor version release stream.
- `machine-config-operator` sources do not require a fork to run in OKD.
- All container images that are part of operators shipped by `cluster-samples-operator` are continuously available as either Fedora or UBI-based container images.
- [x] [OKD repository](https://github.com/openshift/okd/) exists for community issue and bug triage, and development tracking.
- [x] The UBI-based container images for all components in the release payload are shared between OKD and OCP for testing in Prow (with a few exceptions, see below)
- [x] OKD build and release jobs are added to `openshift/release`
- [x] Fedora RPM- and FCOS ostree-based `machine-os-content` container images are built continuously from the [`openshift/okd-machine-os`](https://github.com/openshift/okd-machine-os/) repository, with promotion gating for each minor version release stream.
- [x] Mirror OKD release payloads from the internal registry to quay.io
- [x] Support and end-to-end CI testing is added for running `machine-config-operator`, and the `machine-config-daemon` on FCOS-based OKD clusters.
- [x] `machine-config-operator` sources do not require a fork to be built for OKD
- [x] Support and end-to-end CI testing is added for building OKD installer binaries
- [ ] `openshift-install` sources do not require a fork to be built for OKD: ([Pull Request](https://github.com/openshift/installer/pull/4453))
- [x] All container images that are part of operators shipped by `cluster-samples-operator` are continuously available as either Fedora, CentOS or UBI-based container images.
- [ ] All container images that are part of `baremetal-operator` are continuously available as either Fedora, CentOS or UBI-based container images.

## Infrastructure Needed [optional]

- Prow CI compute
- Limited Prow CI compute on all platforms for end-to-end testing
- Storage and/or registries for CI artifacts (RPMs, machine-os-content containers, and operator containers)

0 comments on commit 00ba40d

Please sign in to comment.