WIP: doc: Begin a document on adding a new OpenShift platform #1112

smarterclayton · 2019-01-22T19:07:34Z

This covers the minimal steps and process to go from "nothing" to
"OpenShift is fully capable of running on your platform". Heavily
work in progress, but should capture the why, our support levels,
and our target config, as well as mechanical steps to get down the
line.

cgwalters · 2019-01-22T20:13:02Z

docs/dev/adding-new-platform.md

+
+### Enable core platform
+
+1. **Boot** - Ensure RH CoreOS boots on the desired platform, that Ignition works, and that you have VM / machine images to test with


I'd also note here that for new cloud platforms, Ignition may need support upstream. For example here's a PR for a non-top-tier cloud: coreos/ignition#667

cgwalters · 2019-01-22T20:16:42Z

docs/dev/adding-new-platform.md

+
+To boot RHCoS to a new platform, you must:
+
+1. Ensure ignition supports that platform via an OEM ID


Ahh I see you cover this here. I'd point to coreos/fedora-coreos-tracker#95
and actually in an ideal world patches land in FCOS first and we later backport them.

I'd prefer to reference ignition directly for now so as to make it clear what the priority ordering is.

The link is what I was alluding to

docs/dev/adding-new-platform.md

cuppett · 2019-01-22T20:44:13Z

docs/dev/adding-new-platform.md

+5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown
+6. **Enable Platform** Ensure all operators treat your platform as a no-op
+7. **CI Job** Add a new CI job to the installer that uses the credentials above to run the installer against the platform and correctly tear down resources
+8. **Publish Images** Ensure RH CoreOS images on the platform are being published to a location CI can test


Should publish images be before #7?

not actually required to get the PR up, which is just why I ordered it (you can publish one yourself into the CI infra)

docs/dev/adding-new-platform.md

cuppett · 2019-01-22T21:07:09Z

docs/dev/adding-new-platform.md

+   5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls
+   6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>`
+   7. Has an internal DNS CNAME pointing to each master called `etcd-N.<BASE_DOMAIN>` that 
+   8. Has an optional internal load balancer that TCP load balances all master nodes, with a DNS name `internal-api.<BASE_DOMAIN>` pointing to the load balancer.


Is the DNS name optional too (or just the load balancer)? Would the external DNS need the internal-api name registered for use by the cluster without internal DNS?

DNS isn't optional for cert signing, but I guess you could technically sign your IP.

cuppett · 2019-01-22T21:15:21Z

docs/dev/adding-new-platform.md

+2. **Arch** - Identify the correct opinionated configuration for a desired platform supporting the default features.
+3. **CI** - Identify credentials and setup for a CI environment, ensure those credentials exist and can be used in the CI enviroment
+4. **Name** - Identify and get approved the correct naming for adding a new platform to the core API objects (specifically the [infrastructure config](https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go) and the installer config (https://github.com/openshift/installer/blob/master/pkg/types/aws/doc.go)) so that we are consistent
+5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown


Can we chronicle those out in a separate doc or section? Below, we identify the DNS and load balancer requirements ( L48-L76). We should be able to identify those, bucket and networking reqs for the current product and identify the IPI and UPI behaviors/expectations of those components.

Those should be in Enable Provisioning.

russellb · 2019-01-28T20:38:07Z

docs/dev/adding-new-platform.md

+   3. Have low latency interconnections connections (<5ms RTT) and persistent disks that survive reboot and are provisoned for at least 300 IOPS
+   4. Have cloud or infrastructure firewall rules that at minimum allow the standard ports to be opened (see AWS provider)
+   5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls
+   6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>`


<CLUSTER_NAME>-api.<BASE_DOMAIN)>

russellb · 2019-01-28T20:38:24Z

docs/dev/adding-new-platform.md

+   4. Have cloud or infrastructure firewall rules that at minimum allow the standard ports to be opened (see AWS provider)
+   5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls
+   6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>`
+   7. Has an internal DNS CNAME pointing to each master called `etcd-N.<BASE_DOMAIN>` that 


<CLUSTER_NAME>-etcd-N.<BASE_DOMAIN>

This covers the minimal steps and process to go from "nothing" to "OpenShift is fully capable of running on your platform". Heavily work in progress, but should capture the why, our support levels, and our target config, as well as mechanical steps to get down the line.

openshift-ci-robot · 2019-01-28T20:56:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2019-01-30T18:06:32Z

docs/dev/adding-new-platform.md

+3. **CI** - Identify credentials and setup for a CI environment, ensure those credentials exist and can be used in the CI enviroment
+4. **Name** - Identify and get approved the correct naming for adding a new platform to the core API objects (specifically the [infrastructure config](https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go) and the installer config (https://github.com/openshift/installer/blob/master/pkg/types/aws/doc.go)) so that we are consistent
+5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown
+6. **Enable Platform** Ensure all operators treat your platform as a no-op


If we have a general policy for "operators treat unrecognized platforms as if they were none", then this step would not be required when adding a new platform.

Ah, you have that policy down here. I think you can drop this list entry, and we can file bugs with any operators that are currently non-compliant.

wking · 2019-01-30T18:12:36Z

docs/dev/adding-new-platform.md

+Once the platform can be launched and tested, system features must be implemented. The sections below are roughly independent:
+
+* General requirements:
+    * Replace the installer terraform destroy with one that doesn't rely on terraform state


nit: "terraform" -> "Terraform".

And maybe mention that this is because, once cluster components can create additional resources on the target platform, we'll still need to clean them up, and Terraform won't know about them.

wking · 2019-01-30T18:32:05Z

docs/dev/adding-new-platform.md

+   1. Runs RH CoreOS
+   2. Is reachable by control plane nodes over the network
+   3. Is part of the control plane load balancer until it is removed
+   4. Can reach a network endpoint that hosts the bootstrap ignition file securely, or has the bootstrap ignition injected


nit: "ignition" -> "Ignition" here and elsewhere in this doc.

wking · 2019-01-30T18:34:02Z

docs/dev/adding-new-platform.md

+The following clarifications to configurations are noted:
+
+1. The control plane load balancer does not need to be exposed to the public internet, but the DNS entry must be visible from the location the installer is run.
+2. Master nodes are not required to expose external IPs for SSH access, but can instead allow SSH from a bastion inside a protected network.


Drop "Master" and the following list entry? This applies equally to master and compute nodes; I don't see an upside to splitting over two entries.

wking · 2019-01-30T18:36:19Z

docs/dev/adding-new-platform.md

+
+Red Hat CoreOS uses ignition to receive initial configuration from a remote source. Ignition has platform specific behavior to read that configuration that is determined by the `oemID` embedded in the VM image.
+
+To boot RHCoS to a new platform, you must:


nit: "RHCoS" -> "RHCOS", here and elsewhere in this doc? I think the acronym is [R]ed [H]at [C]ore [OS], not, [R]ed [H]at [Co]re O[S].

wking · 2019-01-30T18:39:03Z

docs/dev/adding-new-platform.md

+Continuous Integration
+----------------------
+
+To enable a new platform, require a core continuous integration testing loop that verifies that new changes do not regress our support for the platform. The minimum steps required are:


nit: "require" -> "we require", or similar.

wking · 2019-01-30T18:39:22Z

docs/dev/adding-new-platform.md

+
+To enable a new platform, require a core continuous integration testing loop that verifies that new changes do not regress our support for the platform. The minimum steps required are:
+
+1. Have an infrastructure that can receive API calls from the OpenShift CI system to provision/destroy instances


"instances" -> "infrastructure".

wking · 2019-01-30T18:43:23Z

docs/dev/adding-new-platform.md

+
+1. Add a new hidden provisioner
+2. Define the minimal platform parameters that the provisioner must support
+3. Use Terraform or direct Go code to provision that platform via the credentials provided to the installer.


"provision" -> "provision and destroy"? One benefit of Terraform is that it makes centralized bootstrap teardown fairly straightforward, although you could certainly switch on the platform to invoke platform-specific Go bootstrap-teardown code. And we need to destroy resources for destroy cluster to keep the account from filling with cruft, although that doesn't need to be as specific as bootstrap teardown.

wking · 2019-01-30T18:44:38Z

docs/dev/adding-new-platform.md

+2. Define the minimal platform parameters that the provisioner must support
+3. Use Terraform or direct Go code to provision that platform via the credentials provided to the installer.
+
+A minimal provisioner must be able to launch the control plane and bootstrap node via an API call and accept any "environmental" settings like network or region as inputs. The installer should use the Route53 DNS provisioning code to set up round robin to the bootstrap and control plane nodes if necessary.


Is this Route 53 reference intentional? For example, libvirt uses its own DNS configuration for RRDNS, and doesn't involve Route 53.

smarterclayton · 2019-03-13T14:08:33Z

/retest

tux-o-matic · 2019-03-20T09:19:08Z

docs/dev/adding-new-platform.md

+
+1. The control plane nodes:
+   1. Run RH CoreOS, allowing in-place updates
+   2. Are fronted by a load balancer that allows raw TCP connections to port 6443 and exposes port 443


To be IaaS neutral, wouldn't it be possible to use Keepalive (within Kube since RHCOS is immutable)? It could be used either as LB or failover handler. Not using AWS doesn't automatically means having a hardware LB in front of a cluster.

openshift-ci-robot · 2019-10-04T16:19:50Z

@smarterclayton: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-rhel8	`974d6cc`	link	`/test e2e-aws-rhel8`
ci/prow/e2e-aws-upgrade	`974d6cc`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-aws	`974d6cc`	link	`/test e2e-aws`
ci/prow/e2e-aws-disruptive	`974d6cc`	link	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2020-02-10T17:35:21Z

Closing due to this being open for a long time, Please feel free to reopen

/close

openshift-ci-robot · 2020-02-10T17:35:23Z

@abhinavdahiya: Closed this PR.

In response to this:

Closing due to this being open for a long time, Please feel free to reopen

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 22, 2019

openshift-ci-robot requested review from crawford and staebler January 22, 2019 19:07

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2019

smarterclayton mentioned this pull request Jan 22, 2019

DO NOT MERGE: Prototype GCP installer-provisioned-infra #1109

Closed

6 tasks

smarterclayton force-pushed the docs_for_platform branch from 8823456 to c76b440 Compare January 22, 2019 19:20

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 22, 2019

smarterclayton force-pushed the docs_for_platform branch from c76b440 to a556da8 Compare January 22, 2019 19:40

cgwalters reviewed Jan 22, 2019

View reviewed changes

cuppett reviewed Jan 22, 2019

View reviewed changes

smarterclayton force-pushed the docs_for_platform branch 3 times, most recently from 1a8030e to ef85962 Compare January 23, 2019 00:46

This was referenced Jan 23, 2019

mco: If the platform is unrecognized, treat the platform as 'none' openshift/machine-config-operator#340

Merged

infra: Add more constants for platforms and update godoc openshift/api#182

Merged

russellb reviewed Jan 28, 2019

View reviewed changes

smarterclayton force-pushed the docs_for_platform branch from ef85962 to 974d6cc Compare January 28, 2019 20:56

wking reviewed Jan 30, 2019

View reviewed changes

cgwalters mentioned this pull request Feb 19, 2019

should be a no-op on unknown platforms openshift/cluster-storage-operator#13

Closed

smarterclayton mentioned this pull request Feb 20, 2019

Handle unsupported platform errors gracefully openshift/cluster-storage-operator#14

Merged

tux-o-matic reviewed Mar 20, 2019

View reviewed changes

openshift-ci-robot closed this Feb 10, 2020

displague mentioned this pull request Jul 17, 2020

[WIP] Add the Packet provider #3914

Closed

6 tasks

displague mentioned this pull request Dec 10, 2020

Add the Equinix Metal provider #4472

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: doc: Begin a document on adding a new OpenShift platform #1112

WIP: doc: Begin a document on adding a new OpenShift platform #1112

smarterclayton commented Jan 22, 2019

cgwalters Jan 22, 2019

cgwalters Jan 22, 2019

smarterclayton Jan 22, 2019

smarterclayton Jan 22, 2019

cuppett Jan 22, 2019

smarterclayton Jan 22, 2019

cuppett Jan 22, 2019

smarterclayton Jan 22, 2019

cuppett Jan 22, 2019

smarterclayton Jan 22, 2019

russellb Jan 28, 2019

russellb Jan 28, 2019

openshift-ci-robot commented Jan 28, 2019

wking Jan 30, 2019

wking Jan 30, 2019

wking Jan 30, 2019

wking Jan 30, 2019 •

edited

Loading

wking Jan 30, 2019

wking Jan 30, 2019 •

edited

Loading

wking Jan 30, 2019

wking Jan 30, 2019

wking Jan 30, 2019

wking Jan 30, 2019

smarterclayton commented Mar 13, 2019

tux-o-matic Mar 20, 2019

openshift-ci-robot commented Oct 4, 2019 •

edited

Loading

abhinavdahiya commented Feb 10, 2020

openshift-ci-robot commented Feb 10, 2020


		### Enable core platform

		1. Boot - Ensure RH CoreOS boots on the desired platform, that Ignition works, and that you have VM / machine images to test with


		To boot RHCoS to a new platform, you must:

		1. Ensure ignition supports that platform via an OEM ID


		Red Hat CoreOS uses ignition to receive initial configuration from a remote source. Ignition has platform specific behavior to read that configuration that is determined by the `oemID` embedded in the VM image.

		To boot RHCoS to a new platform, you must:


		To enable a new platform, require a core continuous integration testing loop that verifies that new changes do not regress our support for the platform. The minimum steps required are:

		1. Have an infrastructure that can receive API calls from the OpenShift CI system to provision/destroy instances

WIP: doc: Begin a document on adding a new OpenShift platform #1112

WIP: doc: Begin a document on adding a new OpenShift platform #1112

Conversation

smarterclayton commented Jan 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jan 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking Jan 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking Jan 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Mar 13, 2019

Choose a reason for hiding this comment

openshift-ci-robot commented Oct 4, 2019 • edited Loading

abhinavdahiya commented Feb 10, 2020

openshift-ci-robot commented Feb 10, 2020

wking Jan 30, 2019 •

edited

Loading

wking Jan 30, 2019 •

edited

Loading

openshift-ci-robot commented Oct 4, 2019 •

edited

Loading