Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new rhcos-metadata command #2092

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

Today the installer pins RHCOS, and that's not likely
to change. The embedded version isn't easy to know in
pre-release/development scenarios.

Further, the way we do releases via CI, we don't know
what the "re-branded OpenShift version" (e.g. 4.1.0)
will be until it's released.

For other things like the AMIs, today the documentation
writers copy them into the docs manually.

Since the installer is the canonical source of this data,
let's make it easy to extract.

We're not documenting this much yet, but having this
data easily available is going to be extremely helpful
for the variety of cases mentioned above and allow
other teams to have a canonical way to get it.

Related: #1399

Today the installer pins RHCOS, and that's not likely
to change.  The embedded version isn't easy to know in
pre-release/development scenarios.

Further, the way we do releases via CI, we don't know
what the "re-branded OpenShift version" (e.g. 4.1.0)
will be until it's released.

For other things like the AMIs, today the documentation
writers copy them into the docs manually.

Since the installer is the canonical source of this data,
let's make it easy to extract.

We're not documenting this much yet, but having this
data easily available is going to be extremely helpful
for the variety of cases mentioned above and allow
other teams to have a canonical way to get it.

Related: openshift#1399
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 25, 2019
@abhinavdahiya
Copy link
Contributor

I don't think giving users access to the rhcos.json is a good idea. the only detail that we should move towards is the release image only, everything should be based of that.

@cgwalters
Copy link
Member Author

That's simply not going to happen for 4.2 though.

@cgwalters
Copy link
Member Author

Happy to "hide" this more; if it's an undocumented openshift-install version --output-rhcos-meta option or something that's fine by me too.

@cgwalters
Copy link
Member Author

At least hopefully in 4.2. the git commit hash output by openshift-install version will be correct, and one will be able to look at git for the data. In 4.1 as part of researching the bootimage update issue, the git hash was useless so I ended up trying to use strings on the openshift-install binary and belatedly discovered that go-bindata zlib compresses things which greatly escalated the difficulty of extracting this... I just don't want to have to do something like that ever again.

@abhinavdahiya
Copy link
Contributor

If we take the use-case UPI users can use it.

How will this be used?

@cgwalters
Copy link
Member Author

See this bug for example: https://bugzilla.redhat.com/show_bug.cgi?id=1730910
Where the user was searching for AMIs by description. I want them to instead run this command.
That's relevant for UPI EC2.

Now, UPI in general, particularly e.g. OpenStack and bare metal...I am not claiming this command solves all problems. In particular for releases if we want to redirect people to e.g. mirror.openshift.com we're going to probably have to do the same "binary surgery" we do for the release payload. Which would...obviously be cleaner if the data was in the release payload - and we're all in agreement that would be better! But we have two PRs on this that were stalled/rejected and if we want it to happen it needs to be a 4.3 priority.

In the meantime for 4.2...having a canonical/hidden command that dumps this JSON gets us into a better place, if obviously still far from ideal.

@abhinavdahiya
Copy link
Contributor

See this bug for example: https://bugzilla.redhat.com/show_bug.cgi?id=1730910
Where the user was searching for AMIs by description. I want them to instead run this command.
That's relevant for UPI EC2.

Now, UPI in general, particularly e.g. OpenStack and bare metal...I am not claiming this command solves all problems. In particular for releases if we want to redirect people to e.g. mirror.openshift.com we're going to probably have to do the same "binary surgery" we do for the release payload. Which would...obviously be cleaner if the data was in the release payload - and we're all in agreement that would be better! But we have two PRs on this that were stalled/rejected and if we want it to happen it needs to be a 4.3 priority.

In the meantime for 4.2...having a canonical/hidden command that dumps this JSON gets us into a better place, if obviously still far from ideal.

If there is no use-case outside of UPI users need this, I don't think this command makes sense.

Also I don't like the sound of hiding it. flag or sub command this becomes part of the api and I'm not convinced with the use-case right now.

@cgwalters
Copy link
Member Author

cgwalters commented Jul 25, 2019

I know this is "your" project and not the RHCOS' team's project but...the installer pins RHCOS and the RHCOS team needs to field questions and issues related to the installer/RHCOS relationship, and I feel that gives us a say in this. I've responded to many issues here and again - I'm not just writing this PR because I have spare time. It will help me and others - enough so that I think it's worth carrying. If it's hidden we're within our rights to drop it later.

However, the lack of machine-readable AMIs is definitely something I think UPI AWS users would want to script, and so there's certainly a possibility it would become hard to change; but that said I can't imagine that we'd break the format, it's defined by coreos-assembler and we're already filtering it to a subset of keys.

@abhinavdahiya
Copy link
Contributor

I know this is "your" project and not the RHCOS' team's project but...the installer pins RHCOS and the RHCOS team needs to field questions and issues related to the installer/RHCOS relationship, and I feel that gives us a say in this. I've responded to many issues here and again - I'm not just writing this PR because I have spare time. It will help me and others - enough so that I think it's worth carrying.

I not against anything that would help RHCOS team fix issues. And that was the reason I asked what was the use-case outside of "UPI users will use it" (because UPI users should be getting the source-of-truth for bootimages from different sources) ...

@patrickdillon
Copy link
Contributor

If this were to be included, I think it might make more sense to include it under the gathercommand rather than creating a new command. We'd need to tweak the --help page.

@sferich888
Copy link
Contributor

Support bus infavor of making this information more accessible, it helps us better explain the relationship of pieces and reduce the support burden.

@DanyC97
Copy link
Contributor

DanyC97 commented Aug 9, 2019

will this PR progress or is on hold for now ? if so will be good to have the appropriate label

@sferich888
Copy link
Contributor

Install instructions like https://github.com/openshift/installer/tree/master/docs/user/openstack#rhcos-image are not effective without a tool like this.

@cgwalters
Copy link
Member Author

If this were to be included, I think it might make more sense to include it under the gathercommand rather than creating a new command. We'd need to tweak the --help page.

What command do you want there specifically? gather offhand feels weird since it's about debugging a failed install, this is something you run before an install but on the other hand I'd rather land something than have this PR rot, so let me know what you prefer!

@cgwalters
Copy link
Member Author

cgwalters commented Aug 19, 2019

If this PR doesn't land, I will keep telling people to do the following:

  1. Run openshift-install version and look for built from commit 3ab66dec4cbcdc7339e05f9e5a6e27dac3229e47.
  2. Take that sha1 hash and use it like this: https://github.com/openshift/installer/blob/3ab66dec4cbcdc7339e05f9e5a6e27dac3229e47/data/data/rhcos.json

Or if it's more convenient, you can find the latest version in a release branch, for example:

These instructions will of course break if we change the implementation of how we pin RHCOS in the installer, which is unlikely for 4.2 at least.

One thing I think we want here is a "pre-release" job that polls the installer git master (and branches) and updates https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/ with it.

@stbenjam
Copy link
Member

stbenjam commented Aug 19, 2019

Huge +1 from me. This would be very useful for baremetal IPI development, we currently manually follow the version in our dev-scripts environment to pre-download the images on our development hosts to speed up deployment time, because it's such a hassle to extract the data when using installer images, I guess we'd have to do something like above ^

@patrickdillon
Copy link
Contributor

patrickdillon commented Aug 19, 2019

What command do you want there specifically? gather offhand feels weird since it's about debugging a failed install, this is something you run before

I would suggest something like gather rhcos-metadata. If it's weird, maybe it's not a good idea. I'm not sure if anything beyond the help docs defines/limits the scope of the gather command. The semantics of the command sound fine to me, and it seems in the ballpark of gather, but this is just an idea and not something I feel strongly about.

I'd rather land something than have this PR rot, so let me know what you prefer!

I don't think this suggestion is holding up the PR.

@wking
Copy link
Member

wking commented Aug 19, 2019

Also in this space, OSD folks currently hard-code m5.xlarge control planes nodes because they don't have a way to ask a given installer (via Hive?) "what control-plane instance type would you suggest for this region?" . And they want to know so they can preflight instance-type limits. Dunno if we want to construct an approach that can address that use-case too.

@abhinavdahiya
Copy link
Contributor

Also in this space, OSD folks currently hard-code m5.xlarge control planes nodes because they don't have a way to ask a given installer (via Hive?) "what control-plane instance type would you suggest for this region?" . And they want to know so they can preflight instance-type limits. Dunno if we want to construct an approach that can address that use-case too.

hmm, if they want the installer to provide those shouldn't they just not set them? Or if they think they need separate defaults they should handle the decision outside..

@abhinavdahiya
Copy link
Contributor

If this PR doesn't land, I will keep telling people to do the following:

1. Run `openshift-install version` and look for `built from commit 3ab66dec4cbcdc7339e05f9e5a6e27dac3229e47`.

2. Take that sha1 hash and use it like this: https://github.com/openshift/installer/blob/3ab66dec4cbcdc7339e05f9e5a6e27dac3229e47/data/data/rhcos.json

These instructions will of course break if we change the implementation of how we pin RHCOS in the installer, which is unlikely for 4.2 at least.

One thing I think we want here is a "pre-release" job that polls the installer git master (and branches) and updates https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/latest/ with it.

That value is going to be almost always in correct for debugging failures purpose because the nodes might have pivoted to another version???

@abhinavdahiya
Copy link
Contributor

If anybody want to know what kube-apiserver version will be deployed by an installer, they run

openshift-install version
oc adm release info <release-image from above>

Why is the RHCOS version any different? In which cases do we think the embedded data in the installer actually going to help?

@cgwalters
Copy link
Member Author

That value is going to be almost always in correct for debugging failures purpose because the nodes might have pivoted to another version???

This isn't about debugging failures, and the target machine-os-content isn't relevant here; this is about the installer-pinned bootimages which people should be starting from. If I want to upload an image to OpenStack, I want something like this command.

@abhinavdahiya
Copy link
Contributor

That value is going to be almost always in correct for debugging failures purpose because the nodes might have pivoted to another version???

This isn't about debugging failures, and the target machine-os-content isn't relevant here; this is about the installer-pinned bootimages which people should be starting from. If I want to upload an image to OpenStack, I want something like this command.

OpenStack for 4.2 requires users to perform UPI like action of getting the image from the mirrors.. which they are working to remove for 4.3.. They understand the docs burden when they decided that. And adding a feature for short period / shortcoming for a platform that will be fixed soon is in my opinion not ideal.

@sferich888
Copy link
Contributor

UPI installs are not going to go away! As such this is not a short term problem.

@wking
Copy link
Member

wking commented Aug 20, 2019

hmm, if they want the installer to provide those shouldn't they just not set them?

They want to be able to preflight "does your account have room for the n ${CONTROL_PLANE_INSTANCE_TYPE} instances you'll need?" and similar checks before telling Hive to launch the installer. We don't preflight that sort of thing ourselves, because it's racy, but I don't have a problem exposing our default choices so others can run racy preflight's ;)

@cgwalters
Copy link
Member Author

See also https://jira.coreos.com/browse/ART-857

A simple way to say all of this is - since the installer is the "source of truth" for our default IPI cloud cases, I really want the other UPI cases to be able to clearly and consistently reference that data as a canonical baseline.

Now we will almost certainly end up for 4.2 with e.g. a new list of AMIs in Asciidoc in the documentation. (Although personally I'd rip that out and tell users to run this command, since anyone scripting things will want a machine-readable representation, and the cosa JSON is fairly human-readable and self-describing - but anyways that's orthogonal). The person writing the docs should run this command. The ART team synchronizing content elsewhere should run this command.

It should be the source of truth.

@stbenjam
Copy link
Member

Is there any update on this? Baremetal IPI would really like to see this added to the installer.

@markmc
Copy link
Contributor

markmc commented Sep 3, 2019

Is there any update on this? Baremetal IPI would really like to see this added to the installer.

Just noticed your approach in openshift-metal3/dev-scripts#778 which I think is worth mentioning here:

# Get the git commit that the openshift installer was built from
OPENSHIFT_INSTALL_COMMIT=$($OPENSHIFT_INSTALLER version | grep commit | cut -d' ' -f4)

# Get the rhcos.json for that commit
OPENSHIFT_INSTALLER_RHCOS=${OPENSHIFT_INSTALLER_RHCOS:-https://raw.githubusercontent.com/openshift/installer/$OPENSHIFT_INSTALL_COMMIT/data/data/rhcos.json}

# Get the rhcos.json for that commit, and find the baseURI and openstack image path
RHCOS_IMAGE_JSON=$(curl "${OPENSHIFT_INSTALLER_RHCOS}")
export RHCOS_INSTALLER_IMAGE_URL=$(echo "${RHCOS_IMAGE_JSON}" | jq -r '.baseURI + .images.openstack.path')

AIUI, there are currently 2 cases where bare metal IPI needs the RHCOS image URL:

  1. To create a configmap for baremetal-operator - see WIP: Add baremetal operator configmap to installer. #2149 - this need should go away in 4.3

  2. To populate a local cache with the appropriate RHCOS image before launching the installer - this is optional, but super useful particularly for developers. The alternative would be to make this a pull-through cache.

@sdodson
Copy link
Member

sdodson commented Sep 9, 2019

I've read through all the comments here and I fear that putting this into the installer makes it appear authoritative for resources that it only requires for itself during IPI installations. Given there are well understood methods to cross reference an installer hash to image metadata, that we don't believe those will be invalidated anytime soon, and that we guarantee that a newer image within a given X.Y is compatible with an older installer this doesn't seem necessary at this point. This also doesn't provide the complete set of data required of ART in order to for their processes so it would still need to be augmented with external data.

I'm going to close this for now. I'm happy to re-evaluate if we find the situation for UPI untenable but right now I don't feel like we have enough data to support that conclusion.
/close

@openshift-ci-robot
Copy link
Contributor

@sdodson: Closed this PR.

In response to this:

I've read through all the comments here and I fear that putting this into the installer makes it appear authoritative for resources that it only requires for itself during IPI installations. Given there are well understood methods to cross reference an installer hash to image metadata, that we don't believe those will be invalidated anytime soon, and that we guarantee that a newer image within a given X.Y is compatible with an older installer this doesn't seem necessary at this point. This also doesn't provide the complete set of data required of ART in order to for their processes so it would still need to be augmented with external data.

I'm going to close this for now. I'm happy to re-evaluate if we find the situation for UPI untenable but right now I don't feel like we have enough data to support that conclusion.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

Given there are well understood methods to cross reference an installer hash to image metadata, that we don't believe those will be invalidated anytime soon

I would definitely like to fix bootimage updates in the 4.3 timeframe, and that may imply changes to how we pin rhcos.

If nothing else, the method above unnecessarily exposes implementation details about how the "bindata" stuff in data gets put in the installer binary.

This also doesn't provide the complete set of data required of ART in order to for their processes so it would still need to be augmented with external data.

What external data?

FWIW I just replied on the ART card.

@cgwalters
Copy link
Member Author

and that we guarantee that a newer image within a given X.Y is compatible with an older installer

But as noted in the ART card, as an RHCOS developer I don't want to need to think about potentially two different bootimage versions per release (one installer pinned, one via some other process).

@cgwalters
Copy link
Member Author

@openshift-ci-robot
Copy link
Contributor

@cgwalters: Reopened this PR.

In response to this:

/reopen
https://bugzilla.redhat.com/show_bug.cgi?id=1788240

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign sdodson
You can assign the PR to them by writing /assign @sdodson in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member Author

The reality is that since 4.1, the installer and RHCOS are attached together by this - I think everyone involved agrees that we should improve the architecture and do something release image based, but the reality is that unless several people start working on that now, it's not going to happen for 4.4. And it's not clear to me we even have it set up for 4.5.

People must not hardcode random versions of the AMI etc. used by RHCOS. That has (a number of times) and will continue to lead to no end of pain.

We made the executive decision for 4.1 to pin RHCOS in the installer. I'll be happy to maintain this code going forward.

@stbenjam
Copy link
Member

stbenjam commented Jan 7, 2020

We really need this for the baremetal platform, as we provide a way for users to override the image sources to be retrieved locally but we don't give them any way to find out what to download. It would be nice to have some more human readable output, but exposing the information at all is hugely valuable to us.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 7, 2020

A possible alternative would be having meta.json at https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/

We want that anyways - we could try to make that canonical and have the installer version derive from it...

Thinking about this a bit more, one downside is that we really want Prow presumbmits that kick off various things off on RHCOS bootimage bumps...and the ART stuff that causes things to appear at that location aren't Prow based. Hmm, I guess maybe we could try to move it to https://github.com/openshift/os/ and teach ART to pull from there?

But then we'd need a bot which auto-updates the pinned version here...

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws 2a324b0 link /test e2e-aws
ci/prow/e2e-aws-scaleup-rhel7 2a324b0 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-libvirt 2a324b0 link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

closing as openshift/enhancements#201 is moving in different direction.

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

In response to this:

closing as openshift/enhancements#201 is moving in different direction.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants