Skip to content

Commit

Permalink
Update OKD proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
LorbusChris committed May 18, 2020
1 parent e26170f commit 9c7d606
Showing 1 changed file with 99 additions and 16 deletions.
115 changes: 99 additions & 16 deletions enhancements/okd-on-fedora-coreos.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,13 @@ reviewers:
- "@runcom"
- "@darkmuggle"
- "@vrutkovs"
- "@sdodson"
- "@crawford"
- "@abhinavdahiya"
- "@cgwalters"
approvers:
creation-date: 2019-10-21
last-updated: 2019-10-21
last-updated: 2020-05-18
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
see-also:
replaces:
Expand All @@ -30,45 +34,124 @@ superseded-by:

## Open Questions [optional]

1. This proposal potentially requires addition of non-Koji-built RPMs from OpenShift's Prow CI instance to Fedora CoreOS composes. Can we do this? If yes, how do we do this?
2. What is the specific list of operators that will go into OKD-on-FCoS?
1. When is OCP migrating to Ignition config spec v3?

Please see the Graduation Criteria section for more context on both open questions.

## Summary

A proposal to create components of an OKD variant that runs atop Fedora CoreOS and to use OpenShift's Prow instance to do so.
OKD is the OpenShift community distribution of Kubernetes.

As part of the version 4 effort, the OKD working group has agreed to target Fedora CoreOS (FCoS) as the primary base operating system for builds of version 4 of the OKD Community Distribution that is powered by the OpenShift codebase.
As part of the version 4 effort, the OKD working group has agreed to target Fedora CoreOS (FCoS) as the primary base operating system for OKD.

This document proposes to incrementally add Prow CI jobs to create Fedora-based artifacts for use in OKD version 4.
This document contains a proposal to create all OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master and release branches alongside OCP artifacts.

Eventually, container images for all components will be shared between OKD and OCP, with the exception of the machine-os-content container image which will be based on Fedora CoreOS content, and any container images that are based on RHEL images in OCP. These will be rebuilt on either a UBI, Fedora or CentOS base for OKD.

A lot of this is already done. What's left is the un-forking of Machine Config Operator and (optionally) Installer repository and building variant-agnostic artifacts of them, as well as changing the Samples Operator to not go degraded on OKD clusters.

## Motivation

Delivering continuous builds of version 4 of OKD, based on Fedora CoreOS.
Fedora CoreOS includes Ignition v2.x to use for the firstboot configuration of newly provisioned machines with Ignition configuration specification 3.x while RHEL CoreOS currently includes and uses Ignition v0.x supporting config spec 2.x.

Ignition configuration is a descriptive format representing the desired on-disk state of the machine.

The installer uses Terraform to provision the bootstrap machine with a config encoded in a spec understood by the host system. After this initial provisioning of the bootstrap host, a Machine Config Server (MCS) is started on the bootsrap host to serve Ignition config to all nodes joining for the initial installation. Subsequently, the bootstrap host is destroyed and another instance of the MCS on the cluster continues to serve Ignition for additional nodes that want to join the cluster later.

Only a limited subset of the Ignition config fields is supported by the MCS (i.e. config that describes the on-disk state like files and systemd units).

In OCP this on-disk state representation used within the Installer and the MCS is currently encoded in Ignition spec v2.2 config types, while v3.0 config types are used in OKD. To account for the different spec types, OKD's Installer binary as well as OKD's Machine Config Operator (MCO) container image are currently built from forked codebases.

Long-term maintainance of those forks is not wanted. The MCO fork represents an OKD 4 GA release blocker. The installer fork, while not blocking GA, should be dissolved after OCP migrates to Ignition config spec v3.

### Goals

The broader goal is to create automated builds of all components needed for the OKD-on-FCoS effort, with a preference on running RPM build jobs in Fedora containers, and exclusively using RPMs from official Fedora repositories for Fedora CoreOS composes. Container artifacts from UBI based CI builds for OCP may be re-used, however artifacts that come from RHEL container based builds may not be re-used to build OKD.
The goal is to create all OKD artifacts in OpenShift's Prow instance from the canonical OpenShift master and release branches.

### Non-Goals

The migration to Ignition Spec v3 is not part of this proposal.
The migration of existing OCP clusters to Ignition Config Spec v3 is not part of this proposal, however the functionality proposed here will facilitate that in the future.

## Proposal

In order to create automated builds of OKD-on-FCoS, OpenShift's Prow instance shall be used to run jobs to build all required components/artifacts in Fedora containers. The definitions of these jobs will live side by side with the already existing definitions for OCP jobs in the `openshift/release` repository, or in the respective operators' repos. Operators that will not go into OKD can be disregarded from this effort.
There are four parts to this proposal: Changes concerning Prow, MCO, Samples Operator and Installer.

### 1: Setup Prow Jobs
The definitions of OKD build jobs for machine-os-content containers will be added to the the `openshift/release` repository to live side by side with the OCP job defintions.

This is mostly done.

### 2: Machine Config Operator Changes

These changes are requirements for OKD 4 GA.

#### Pulling Machine Config Daemon from release payload

Fedora CoreOS images do not contain the Machine Config Daemon (MCD).
In OKD, the MCD (which contains the `pivot` command) is pulled onto the nodes by extracting it from the Machine Config Operator image. Then it can proceed to pivot into the current OS version specified in the machine config.

The proposal is to align OCP with this, so the MCD is always pulled from image and is not needed in the RHEL CoreOS image compose.

One benefit of this is that version drift between MCD from the boot image and from the current machine-os-content is ruled out, as the MCD will always come from the MCO image from the current payload.

Note: MCO images are currently UBI7 based. For RHEL CoreOS it should be preferred to pull an MCD built inside a UBI8 container.

#### Dual Ignition config spec v2/v3 support in Machine Config Operator

The proposal is to add dual Ignition support to the MCO so that OKD does not require a forked version and can instead be built from the canonical branches in the machine-config-operator repository.

In order to achive this, the following will be done:

Firstly, the ability to parse and translate v3.x configs to v2.2 will be added. This way the MCO will be able to process v3.x config that is either provided by the installer during install, or by the cluster admin at a later point. The rendered machine config representing the current on-disk state internally will remain spec v2.2 for now.

Secondly, the Machine Config Server will gain the ability to inspect the User Agent HTTP header of the incoming config request to serve a spec version supported by the requesting system. For this, the canonical rendered machine config will be translated up to spec v3 if the User Agent shows an Ignition version larger than 2.0.0.

This is implemented by this PR: https://github.com/openshift/machine-config-operator/pull/1703

The last commit in that PR already shows that the spec used to store on-disk state internally (i.e. the rendered machine config) could already be switched over to v3 without any breaking changes in external APIs. This however is not required to enable using the same MCO images for OKD and OCP.

### 3: Sample Operators Changes

#### Mark RHEL based operator images as optional

The samples operator degrades on OKD due to the fact that not all images are pullable without a valid pull secret. The behaviour should change so it does not degrade on OKD clusters for users that have not specified a valid pull secret.

Alternatively, all RHEL based images this concers could be rebuilt on a CentOS or Fedora base, and a mechanism to pull them in case of no valid pull secret could be implemented in `cluster-samples-operator`.

### 4: Installer Changes

These changes are optional for the immediate goal of releasing OKD 4 GA. They should however be strongly considered in order to facilitate dropping the OKD fork of installer rather sooner than later.

#### Add fcos.json

Support for an `fcos.json` to exist alongside `rhcos.json` for OS image references.

This could be enabled either via a build-time flag to produce seperate OKD installer artifacts, or an `fcos` or `okd` run-time flag to pass to the installer for it to provision Fedora CoreOS instead of RHEL CoreOS hosts (in that case it would default to contents of `rhcos.json`).

#### Add option to disable pull secret to Installer

OKD installs do not require a pull secret, so make it optional.

This could be enabled either via a build-time flag to produce seperate OKD installer artifacts, or a run-time flag to signal to the installer to not require a pull secret.

#### Switching Installer to Ignition spec v3

The MachineConfig API vendored in the Installer has to be updated to the current version that uses a RawExtension to store the ignition config.

To facilitate the migration, all functions requiring Ignition types will be abstracted into one package. This is implemented by this PR: https://github.com/openshift/installer/pull/3248

With the MCO's ability to process both spec v2 and v3 configs, switching the installer over to spec v3 is technically possible.

It is imperative that the timing of this change is aligned with RHEL CoreOS images switching to an Ignition v2.x binary that supports spec v3.

Once this migration is complete, it will be possible for OKD to use an installer built off of the same sources.

### Graduation Criteria

Specifically, success of this proposal will be measured by the following:
- RPMs that are dependencies of OpenShift's codebase in the base OS are built continuously in a Fedora environment: `cri-o`, `openshift-hyperkube`, `openshift-clients`, `machine-config-daemon`
- **Open Question**: How are builds promoted from Prow CI to FCoS release composes?
- Possible alternative to Prow promotion: Automated Fedora Koji RPM builds with packit-service CI (would need implementation work)
- Fedora CoreOS composes that contain up-to-date versions of the aforementioned dependencies are built continuously.
- Fedora RPM based `machine-os-content` containers are built continuously, with promotion gating for each minor version release stream.
- All Operators that are part of OKD-on-FCoS are continuously available as either Fedora or UBI-based container images.
- **Open Question**: What is the specific list of operators that will go into OKD-on-FCoS?
- `machine-config-operator` sources do not require a fork to run in OKD.
- All container images that are part of operators shipped by `cluster-samples-operator` are continuously available as either Fedora or UBI-based container images.

## Infrastructure Needed [optional]

Expand Down

0 comments on commit 9c7d606

Please sign in to comment.