Uploading to cloud platforms: GCP #147

dustymabe · 2019-02-20T15:34:42Z

This is part of #146 and tracks the work/discussion around uploading to GCP.

bgilbert · 2019-03-15T01:37:21Z

I think this involves the following:

Have GCP create a project to host the images. That project is traditionally called <something>-cloud. We should decide whether the project should be FCOS-specific or Fedora-wide. Our GCP contacts would like to meet with someone from releng to start setting this up.
Adapt plume for FCOS.
After testing, have GCP mark the project public.

dustymabe · 2019-05-08T20:03:31Z

related: coreos/coreos-assembler#493

bgilbert · 2019-10-09T06:12:25Z

Infra ticket to create the requisite GCP projects.

cgwalters · 2020-01-15T20:42:54Z

I'm working on some GCP bits for RHCOS (see e.g. openshift/installer#2921 ) and not having FCOS there inhibits doing things upstream first.

dghubble · 2020-03-29T01:28:02Z

With manually uploaded GCP images, zincati checks for updates and reboots promptly (speedy!) after boot which has some interesting cascading affect on interrupting inital cluster bootstrapping that I noticed today. Being able to use a latest GCP channel image would have a nice side-benefit of (mostly) clearing up immediate reboot issues.

cgwalters · 2020-03-29T12:20:17Z

@dghubble slightly related coreos/zincati#251
But you probably need to be using https://github.com/coreos/airlock or an equivalent if you aren't.

LorbusChris · 2020-03-29T15:36:08Z

I think also related here is #392

In OKD we're working around this by adding a etc/zincati/config.d/90-disable-feature.toml that explicitly disables Zincati updates with Ignition: https://github.com/openshift/installer/blob/fcos/data/data/bootstrap/files/etc/zincati/config.d/90-disable-feature.toml

bgilbert · 2020-03-30T02:33:44Z

@LorbusChris That approach makes sense when the cluster has its own mechanism for updating nodes. In general, though, we shouldn't encourage users to disable updates.

LorbusChris · 2020-03-30T03:05:09Z

Indeed, we shouldn't encourage that in general. My thinking was that in this specific case of a (probably short-lived) bootstrap host it may make some sense :)

dghubble · 2020-03-30T04:42:11Z

Thanks for the suggestions. For now I'm content to follow this issue for GCP uploads. Manually uploading new images or sometimes having to retry the bootstrap in this one case is ok to me as I'd like to keep auto-updates enabled (its a master/controller, rather than a toss-away like OKD uses I think).

Mainly just highlighting that stale manual images can cause these reboots (e.g. new instance creation, after GCE preemption) that wouldn't otherwise be needed, until the channel is in place.

lucab · 2020-03-31T10:05:54Z

@dghubble good feedback points on poseidon/typhoon#687. I have a few more things for you:

Zincati only runs after boot-complete.target. If your bootstrapping flow needs to happen before any update, you can delay reaching that target until done.
if you are bootstrapping from scratch and can't afford some lock-managing service around, it sounds like you may want something like updates: new strategy based on local filesystem zincati#245 implemented
similar to the above, path-activation may be interesting too but I'm unsure on how to plug the default-on and Ignition disabling into that
in general some deterministic trigger is preferred over an arbitrary timer/delay. Both the timer-activation and the SSH-delay would just make races harder to detect.

dghubble · 2020-04-01T05:41:56Z

Thanks @lucab! I think I'd wanna aim for the lock managing approach personally - a local file strategy could help in a pinch here (for bootstrap) and a later an airlock atop the cluster could get (roughly) one at a time node updates (I don't have strong consistency needs). I looked at airlock's etcd strategy, but I'm reluctant to give it access policy-wise or deal with giving it a TLS client cert, etc. I'd picture airlock (or similar) maybe running as a pod with rbac to write a configmap or annotation, some bit of scratch space for locked yes/no. airlock is looking promising and simpler than cluo was, avoiding the coordinator/daemonset.

For 1, I'd worry about blocking the target on future reboots (bootstrap is once per cluster lifetime). 3 yeah is kinda , and agreed that 4 may not be the best for this situation.

Maybe I'll move this to a (new?) issue so I don't hijack.

dustymabe · 2020-04-10T02:18:31Z

Some updates...

I think this involves the following:

Have GCP create a project to host the images. That project is traditionally called <something>-cloud. We should decide whether the project should be FCOS-specific or Fedora-wide. Our GCP contacts would like to meet with someone from releng to start setting this up.

We have a fedora-coreos-cloud project now.

Adapt plume for FCOS.

We adapted ore within COSA to support uploading to GCP and doing what we need.

mantle/api: gcloud: remove FCOS specific image GuestOSFeatures coreos-assembler#1333
mantle/ore: glcoud: add ability to specify image description for gcp uploads coreos-assembler#1328
mantle/ore: gcloud: clean up upload.go, stop mutating image name coreos-assembler#1322
mantle/ore: gcp: add image family support, add deprecate image functionality coreos-assembler#1319

We modified the pipeline to start doing the upload:

After testing, have GCP mark the project public.

GCP is working on making the fedora-coreos-cloud project public. In the meantime I've individually marked the images from our latest testing and stable releases as public so you should be able to use them today:

# gcloud compute images list --project fedora-coreos-cloud --no-standard-images --show-deprecated 
NAME                                       PROJECT              FAMILY                       DEPRECATED  STATUS
fedora-coreos-31-20200323-3-2-gcp-x86-64   fedora-coreos-cloud  fedora-coreos-stable                     READY
fedora-coreos-31-20200407-2-2-gcp-x86-64   fedora-coreos-cloud  fedora-coreos-testing                    READY

vrutkovs · 2020-04-10T08:46:45Z

Is there a bucket with .tar.gz artifact uploaded so that OKD could reuse it?

dustymabe · 2020-04-10T14:19:00Z

the url for the tar.gz within GCP is in the meta.json. My understanding is that it's accessible to any authenticated users. Previously I had considered the file in the image bucket to be purely ephemeral (i.e. only needing to exist for the image import to happen), but I've recently learned of openshift creating images on the fly during install. I'd like to understand why we need to do that if an image is available publicly (and globally) already? Different guestOSfeatures? Encryption?

cgwalters · 2020-04-10T15:35:23Z

but I've recently learned of openshift creating images on the fly during install. I'd like to understand why we need to do that if an image is available publicly (and globally) already? Different guestOSfeatures? Encryption?

The idea broadly speaking was that RHEL CoreOS is an implementation detail of OpenShift. Hence, there wasn't a desire to publish pre-built images globally. This might change at some point.

It would seem reasonable to me though to change openshift-install to support directly using a public GCP image if it's available, it'd just require some conditionals in the logic for terraform.

dustymabe · 2020-04-10T16:05:18Z

but I've recently learned of openshift creating images on the fly during install. I'd like to understand why we need to do that if an image is available publicly (and globally) already? Different guestOSfeatures? Encryption?

The idea broadly speaking was that RHEL CoreOS is an implementation detail of OpenShift. Hence, there wasn't a desire to publish pre-built images globally. This might change at some point.

Yeah I think that makes sense. Since we are publishing images already for Fedora CoreOS I think it makes sense to do what you suggest below 👇🏼 for Fedora CoreOS/OKD.

It would seem reasonable to me though to change openshift-install to support directly using a public GCP image if it's available, it'd just require some conditionals in the logic for terraform.

dustymabe · 2020-05-07T04:04:38Z

remaining items here:

update mantle to use new image update API so we can create images not in an image family and then later add them to an image family
- Step 1: PR in gcp: allow for deprecating and then attaching to image family coreos-assembler#1444
- Step 2: PR in update for GCP image build/release workflow fedora-coreos-pipeline#237
attach "licenses" to images when they get created (this allows for viewing how many instances are created on each stream)
- Step 1: PR in mantle/ore: gcloud: add ability to attach license to image coreos-assembler#1446
- Step 2: PR in update for GCP image build/release workflow fedora-coreos-pipeline#237
documentation page for GCP GCP documentation page in Fedora CoreOS docs #475
~~change image size to be 10GiB or greater (known to have suboptimal I/O performance with <10G)~~
- broke out this change into GCP: increase image size to 10G #492
run kola CI tests against GCP images
- PR in Add pipeline to test GCP images fedora-coreos-pipeline#233
- Debugging upgrade test issues in kola tests fail in GCP environment when running in pipeline #487

dustymabe · 2020-05-12T03:14:55Z

updated #147 (comment) with current status.

lucab · 2020-05-15T09:12:20Z

@dghubble yes, feel free to move to another ticket or a forum discussion. From your last reply, it looks like a volatile fragment in /run to use coreos/zincati#245 on first-boot only could be a good solution.

dustymabe · 2020-05-22T13:43:21Z

updated #147 (comment) with current status.

dghubble · 2020-05-22T18:24:15Z

I'm using the new uploaded GCP images instead of manually uploading. Many thanks for these!

# Terraform example
# Fedora CoreOS most recent image from stream
data "google_compute_image" "fedora-coreos" {
  project = "fedora-coreos-cloud"
  family  = "fedora-coreos-${var.os_stream}"
}

dustymabe · 2020-05-22T22:34:03Z

ok all things in #147 (comment) are now done or broken out into other tickets. I also created a new ticket for us getting this plumbed through to our download page: #494

I'm going to mark this as done. It was a long journey.

dustymabe mentioned this issue Feb 20, 2019

tracker: uploading to cloud platforms #146

Open

5 tasks

dustymabe added the cloud* related to public/private clouds label Feb 20, 2019

bgilbert changed the title ~~Uploading to cloud platforms: GCE~~ Uploading to cloud platforms: GCP Mar 20, 2019

miabbott added the jira for syncing to jira label Sep 13, 2019

vrutkovs mentioned this issue Dec 5, 2019

OKD support for GCP provider okd-project/okd#11

Closed

vrutkovs mentioned this issue Feb 6, 2020

WIP OKD: add optional platform tests to fcos branch of the installer openshift/release#6976

Closed

5 tasks

dustymabe self-assigned this Mar 29, 2020

lucab mentioned this issue Mar 31, 2020

systemd: Activative via zincati.timer, not by default coreos/zincati#251

Closed

dustymabe closed this as completed May 22, 2020

dustymabe mentioned this issue Jun 11, 2020

GCP: plumb through GCP info to our download page #494

Closed

lucab mentioned this issue Jun 18, 2020

agent: delay reboot if ongoing interactive sessions coreos/zincati#115

Closed

cgwalters mentioned this issue Jan 25, 2021

Define an azure schema coreos/stream-metadata-go#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploading to cloud platforms: GCP #147

Uploading to cloud platforms: GCP #147

dustymabe commented Feb 20, 2019 •

edited by bgilbert

Loading

bgilbert commented Mar 15, 2019

dustymabe commented May 8, 2019

bgilbert commented Oct 9, 2019

cgwalters commented Jan 15, 2020

dghubble commented Mar 29, 2020

cgwalters commented Mar 29, 2020

LorbusChris commented Mar 29, 2020

bgilbert commented Mar 30, 2020

LorbusChris commented Mar 30, 2020

dghubble commented Mar 30, 2020

lucab commented Mar 31, 2020

dghubble commented Apr 1, 2020 •

edited

Loading

dustymabe commented Apr 10, 2020

vrutkovs commented Apr 10, 2020

dustymabe commented Apr 10, 2020

cgwalters commented Apr 10, 2020

dustymabe commented Apr 10, 2020

dustymabe commented May 7, 2020 •

edited

Loading

dustymabe commented May 12, 2020

lucab commented May 15, 2020

dustymabe commented May 22, 2020

dghubble commented May 22, 2020

dustymabe commented May 22, 2020

Uploading to cloud platforms: GCP #147

Uploading to cloud platforms: GCP #147

Comments

dustymabe commented Feb 20, 2019 • edited by bgilbert Loading

bgilbert commented Mar 15, 2019

dustymabe commented May 8, 2019

bgilbert commented Oct 9, 2019

cgwalters commented Jan 15, 2020

dghubble commented Mar 29, 2020

cgwalters commented Mar 29, 2020

LorbusChris commented Mar 29, 2020

bgilbert commented Mar 30, 2020

LorbusChris commented Mar 30, 2020

dghubble commented Mar 30, 2020

lucab commented Mar 31, 2020

dghubble commented Apr 1, 2020 • edited Loading

dustymabe commented Apr 10, 2020

vrutkovs commented Apr 10, 2020

dustymabe commented Apr 10, 2020

cgwalters commented Apr 10, 2020

dustymabe commented Apr 10, 2020

dustymabe commented May 7, 2020 • edited Loading

dustymabe commented May 12, 2020

lucab commented May 15, 2020

dustymabe commented May 22, 2020

dghubble commented May 22, 2020

dustymabe commented May 22, 2020

dustymabe commented Feb 20, 2019 •

edited by bgilbert

Loading

dghubble commented Apr 1, 2020 •

edited

Loading

dustymabe commented May 7, 2020 •

edited

Loading