Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mantle/ore: gcp: add image family support, add deprecate image functionality #1319

Merged
merged 7 commits into from
Apr 7, 2020

Conversation

dustymabe
Copy link
Member

In GCP we'd like to use Image Families [1] so that a user can specify an
Image Family (like fedora-coreos-stable) and that will always reference
the latest stable Fedora CoreOS GCP image. The way image families work
is that when the image family is referenced the latest non-deprecated
image from that Image Family is returned. We'd like to have only the
latest released image be in a non-deprecated state in our image families
at any given time.

The strategy for how we use this mechanism is that we create images
in our pipeline when the pipeline runs. The image will be associated
with an image family at that time (unfortunately you can't associate
an image with an image family afterwards), but we will immediately
mark it as deprecated so it won't be used. Upon running the release
pipeline we will mark the new image as ACTIVE and mark the old ACTIVE
image as DEPRECATED.

[1] https://cloud.google.com/compute/docs/images#image_families

@dustymabe dustymabe changed the title mantle/ore: add gcp deprecate image functionality mantle/ore: gcp: add image family support, add deprecate image functionality Apr 6, 2020
@dustymabe
Copy link
Member Author

this builds on #1305

Comment on lines 72 to 74
# Here is where I would do something like:
if args.deprecate:
run_verbose(ore gcloud deprecate-image --image-name gcp_name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we could either encode a seperate call to ore here if the user requested the image to be created in the deprecated state OR we could modify ore gcloud upload to accept a --deprecated argument as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to my understanding there are two different times when we'd want to deprecate an image:

  1. When we first upload it
  2. To mark the old active image as deprecated in preparation for a new active image

For the first case that should probably be the default behavior performed by ore gcloud upload (with a flag to turn off that behavior). The second should always be done as a separate operation so that it's more obvious and intentional that an image (which is not the one being uploaded) is being marked as deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to my understanding there are two different times when we'd want to deprecate an image:

1. When we first upload it

2. To mark the old active image as deprecated in preparation for a new active image

correct. with a slight addition to 2. For 1 it happens when we
first do the upload (during the pipeline run). For 2 it would happen
during the release and we'd need to do two things:

  • set deprecation status to "DEPRECATED" for the image that is currently latest in image family
  • set deprecation status to "ACTIVE" of new image

So we'd make two calls to do that.

For the first case that should probably be the default behavior performed by ore gcloud upload (with a flag to turn off that behavior). The second should always be done as a separate operation so that it's more obvious and intentional that an image (which is not the one being uploaded) is being marked as deprecated.

Since uploading an image that you immediately want to deprecate is kind of a special weird use case we're implementing I thought it would be more appropriate to leave it as not the default and explicitly pass --deprecated or something in the pipeline. For example, I think RHCOS uses this functionality and I don't think we want to change the defaults there right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. with a slight addition to 2. For 1 it happens when we
first do the upload (during the pipeline run). For 2 it would happen
during the release and we'd need to do two things:

  • set deprecation status to "DEPRECATED" for the image that is currently latest in image family
  • set deprecation status to "ACTIVE" of new image

So we'd make two calls to do that.

I'm personally alright with it being two calls (with the caveat that it might make sense to have it be one COSA command which performs both ore actions).

Since uploading an image that you immediately want to deprecate is kind of a special weird use case we're implementing I thought it would be more appropriate to leave it as not the default and explicitly pass --deprecated or something in the pipeline. For example, I think RHCOS uses this functionality and I don't think we want to change the defaults there right now.

That's fair. We could potentially do something like switching the default for FCOS (since we're already passing a flag directly indicating if it's an FCOS upload) but I'd rather stick to just directly passing it in the pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I was thinking. Cool I'll start down that path.


func init() {
cmdDeprecateImage.Flags().StringVar(&deprecateImageName,
"image-name", "", "GCP image name")
Copy link
Member Author

@dustymabe dustymabe Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I couldn't figure out either what the global gcloud image option is used for:

sv(&opts.Image, "image", "", "image name")

or how to grab it and use it here. This kept me from just using --image as the argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just be able to reference opts.Image to get the value that was passed into the --image argument (which is on all commands of the GCloud variable.

Copy link
Member Author

@dustymabe dustymabe Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'm now using opts.Image, but this did require me to slightly change the init code. See the mantle/gcloud: don't mutate Image arg if not provided commit.

@dustymabe dustymabe force-pushed the dusty-gcp-ore-work branch 2 times, most recently from 278ee0f to c402076 Compare April 6, 2020 17:19

func init() {
cmdDeprecateImage.Flags().StringVar(&deprecateImageName,
"image-name", "", "GCP image name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just be able to reference opts.Image to get the value that was passed into the --image argument (which is on all commands of the GCloud variable.

os.Exit(1)
}

fmt.Printf("Attemping to change GCP image deprecation state of %s to %s\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably fits better as a plog.Debugf statement to gate it behind the user requesting more verbose output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I switched this to plog.Debugf and also the other statements in the file to plog.Fatal*. The one thing, I can't figure out how to get the debug statement to print out. I've tried --debug, --log-level=DEBUG, and --verbose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOB IRC discussion happened but looks like you've found a bug: #1321

Comment on lines 72 to 74
# Here is where I would do something like:
if args.deprecate:
run_verbose(ore gcloud deprecate-image --image-name gcp_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to my understanding there are two different times when we'd want to deprecate an image:

  1. When we first upload it
  2. To mark the old active image as deprecated in preparation for a new active image

For the first case that should probably be the default behavior performed by ore gcloud upload (with a flag to turn off that behavior). The second should always be done as a separate operation so that it's more obvious and intentional that an image (which is not the one being uploaded) is being marked as deprecated.

@dustymabe dustymabe force-pushed the dusty-gcp-ore-work branch 2 times, most recently from 978a045 to 5e19713 Compare April 6, 2020 22:21
@dustymabe
Copy link
Member Author

rebased this on top of #1322

Right now this just adds support for image family to upload.go and also support for a new deprecate-image subcommand. It doesn't currently change buildextend-gcp to deprecate anything by default.

I discuss the flow with @bgilbert in IRC and it seems like we may want to go a different path so we'll leave that bit for a followup.

This is ready for review, but I'm still doing some testing in the background so I'll leave it marked as draft.

@@ -106,4 +108,7 @@ def gcp_cli(parser):
Currently enables SECURE_BOOT and UEFI_COMPATIBLE""",
action="store_true",
default=False)
parser.add_argument("--family",
help="GCP image family to attach disk to",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/disk/image/

@@ -54,6 +55,7 @@ func init() {
cmdUpload.Flags().BoolVar(&uploadFedora, "fcos", false, "Flag this is Fedora CoreOS (or a derivative); currently enables SECURE_BOOT and UEFI_COMPATIBLE")
cmdUpload.Flags().BoolVar(&uploadForce, "force", false, "overwrite existing GS and GCE images without prompt")
cmdUpload.Flags().StringVar(&uploadWriteUrl, "write-url", "", "output the uploaded URL to the named file")
cmdUpload.Flags().StringVar(&uploadImageFamily, "family", "", "GCP image family to attach disk to")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/disk/image/

We want to use Image Families so that a user can specify an Image
Family (like `fedora-coreos-stable`) and that will always reference
the latest stable Fedora CoreOS GCP image. This will allow us to do
that.
If the user didn't specify any Image argument then there's no reason
for us to try to mutate it into something useful. Just leave it as an
empty string so later consumers can also do a useful check.
@dustymabe
Copy link
Member Author

rebase on top of latest master and addressed all code review comments - doing final testing now in a custom build pipeline

In cosalib/aws we don't try to add `s3://` and a prefix/path onto the
provided bucket so let's not do it here either.
@dustymabe
Copy link
Member Author

ok this is ready for review.

cc @cgwalters since the cosalib/gcp: make call to ore pass through provided bucket argument will require a change in the downstream pipeline (which I'll open a PR for after this merges)

help="""Flag this is Fedora CoreOS (or a derivative);
Currently enables SECURE_BOOT and UEFI_COMPATIBLE""",
action="store_true",
default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this be off by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also openshift/installer#2921 which merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming we want to have SECURE_BOOT and UEFI_COMPATIBLE everywhere we could just get rid of the flag altogether and bake it in at a lower level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I'd prefer to do this in a follow up if you don't mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming we want to have SECURE_BOOT and UEFI_COMPATIBLE everywhere we could just get rid of the flag altogether and bake it in at a lower level.

That's exactly the status quo today, it's baked into ore by default but you went out of the way to explicitly disable it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind it's off by default, I was wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up to make SECURE_BOOT and UEFI_COMPATIBLE standard and remove the --fcos option: #1333

@cgwalters
Copy link
Member

cc @cgwalters since the cosalib/gcp: make call to ore pass through provided bucket argument will require a change in the downstream pipeline (which I'll open a PR for after this merges)

Requiring simultaneous changes to a pipeline and cosa is unfortunate. How hard would it be to make this compatible?

@cgwalters
Copy link
Member

High level makes sense, though there are no plans to upload RHCOS as an image family right?

GCP is so much better than AWS in this way though, in terms of having a global image identifier, not per region AMIs etc.

@cgwalters
Copy link
Member

Requiring simultaneous changes to a pipeline and cosa is unfortunate. How hard would it be to make this compatible?

That said since we did branch and this only affects cloud image uploads I don't care too much; we can easily get a cosa build and pipeline updated in the same day. But it's usually worth at least trying to "ratchet" changes in a safe way if it's not too hard.

@dustymabe
Copy link
Member Author

High level makes sense, though there are no plans to upload RHCOS as an image family right?

Right. No plans that I know of right now. Since the openshift-installer chooses the image for you I don't know if it would have any benefit.

GCP is so much better than AWS in this way though, in terms of having a global image identifier, not per region AMIs etc.

I know right? It's much better than having to track AMI IDs across regions.

Requiring simultaneous changes to a pipeline and cosa is unfortunate. How hard would it be to make this compatible?

That said since we did branch and this only affects cloud image uploads I don't care too much; we can easily get a cosa build and pipeline updated in the same day. But it's usually worth at least trying to "ratchet" changes in a safe way if it's not too hard.

I think I'd prefer to not introduce compatibility code but I could see doing something like:

# compat for RHCOS pipeline - remove after cloud-gcp.groovy is updated
# to pass --bucket without prepending `gs://`
if not bucket.startswith('gs://'):
    bucket = f"gs://{bucket}"

If you think it'd be worth it.

@cgwalters
Copy link
Member

/lgtm

@cgwalters
Copy link
Member

If you think it'd be worth it.

Carrying the two lines of code for a bit in order to avoid a potential pipeline stall would definitely be something I would have done, yes.

This should make it so our RHCOS pipeline won't hard fail until we get
it updated.
@dustymabe
Copy link
Member Author

If you think it'd be worth it.

Carrying the two lines of code for a bit in order to avoid a potential pipeline stall would definitely be something I would have done, yes.

added the suggested compat code - will merge once I get CI passing

@ashcrow
Copy link
Member

ashcrow commented Apr 7, 2020

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, bgilbert, cgwalters, dustymabe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,bgilbert,cgwalters,dustymabe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit f2f6f65 into coreos:master Apr 7, 2020
@dustymabe dustymabe deleted the dusty-gcp-ore-work branch April 7, 2020 21:31
@cgwalters
Copy link
Member

RHCOS pipeline is failing with

+ ore gcloud --project rhcos-cloud --basename rhcos upload --force --bucket gs://rhcos --json-key /srv/.gce/gce.json --name rhcos-45-81-202004081017-0-gcp-x86-64 --file builds/45.81.202004081017-0/x86_64/rhcos-45.81.202004081017-0-gcp.x86_64.tar.gz --write-url /home/jenkins/workspace/rhcos-art-rhcos-4.5/tmp/tmp0h0dajhx/gcp-url
prefix not specified. Refusing to upload in root directory of bucket

@dustymabe
Copy link
Member Author

opened a downstream PR #885 to hopefully fix the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants