Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: RHCOS image for bootstrap needs to be able to downloaded disconnected #2597

Closed
stbenjam opened this issue Oct 30, 2019 · 28 comments
Closed
Labels
platform/baremetal IPI bare metal hosts platform

Comments

@stbenjam
Copy link
Member

stbenjam commented Oct 30, 2019

Platform:

baremetal

What happened?

It's already possible to use an alternative registry for container images, as well as for the masters' RHCOS images (via baremetal platform's CacheURL).

However, the bootstrap node uses the libvirt RHCOS image, which gets downloaded over the internet:

base, err := url.Parse(meta.BaseURI)

The baseURI is defined in rhcos.json.

What you expected to happen?

Some mechanism to override the baseURI or entire URL for the libvirt RHCOS image. The libvirt image is cached, however it still requires hitting the external URL to get the ETag.

resp, err := http.Get(uri)
if err != nil {
return uri, err
}
if resp.StatusCode != 200 {
return uri, fmt.Errorf("%s while getting %s", resp.Status, uri)
}
defer resp.Body.Close()
key, err := cacheKey(resp.Header.Get("ETag"))

As we know the SHA via rhcos.json, could we use that instead of the ETag for the cache path in $HOME/.cache/openshift-install/libvirt/image/? That way a user could prepopulate the cache.

@stbenjam
Copy link
Member Author

/label platform/baremetal

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Oct 30, 2019
@stbenjam
Copy link
Member Author

#2595

@hardys
Copy link
Contributor

hardys commented Nov 22, 2019

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

If we could add that for both the qemu and openstack images that would probably be ideal, something like:

  platform:
    baremetal:
      bootstrapMachineOSImage: 
      machineOSImage:

Ideally this would be added along with some way to get the upstream URL for both bootimages, so the cache can be easily populated - this was discussed in #1399 and AFAICT is stalled so we'd probably have to continue to check rhcos.json directly in the meantime.

@hardys
Copy link
Contributor

hardys commented Nov 22, 2019

#2711

@stbenjam
Copy link
Member Author

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

Hm, why make it specific to baremetal? What if other platforms need to get images in a disconnected setting?

There's a global config for mirroring release image content under imageContentSources, could we put it there (or a similar field for machine OS content) so RHCOS content could be retrieved from a local mirror as well?

@kirankt
Copy link
Contributor

kirankt commented Nov 25, 2019

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

Hm, why make it specific to baremetal? What if other platforms need to get images in a disconnected setting?

There's a global config for mirroring release image content under imageContentSources, could we put it there (or a similar field for machine OS content) so RHCOS content could be retrieved from a local mirror as well?

@stbenjam @hardys Shouldn't this fix our issue?
https://github.com/openshift/installer/blob/master/pkg/tfvars/libvirt/libvirt.go#L38
https://github.com/openshift/installer/blob/master/pkg/tfvars/internal/cache/cache.go#L181

@stbenjam
Copy link
Member Author

stbenjam commented Nov 25, 2019

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

Hm, why make it specific to baremetal? What if other platforms need to get images in a disconnected setting?
There's a global config for mirroring release image content under imageContentSources, could we put it there (or a similar field for machine OS content) so RHCOS content could be retrieved from a local mirror as well?

@stbenjam @hardys Shouldn't this fix our issue?
/pkg/tfvars/libvirt/libvirt.go@master#L38
/pkg/tfvars/internal/cache/cache.go@master#L181

That doesn't give us any mechanism to override the bootstrap image itself, that code just caches a downloaded copy of the image on the local filesystem where the installer is run.

So, you could pre-populate that cache directory, but in the Hive case I think that's kind of tricky. We'd have to mount some shared volume in the pod. It'd be much easier to override the bootstrap and masters images URL's to point to a local HTTP server with environment variables, or better as part of the install-config

@cgwalters
Copy link
Member

FWIW at some point we are likely to ship something like a "raw bootimage inside a container" to fix openshift/os#381

However, that still leaves the question of provisioning the bootstrap.

@kirankt
Copy link
Contributor

kirankt commented Nov 26, 2019

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

Hm, why make it specific to baremetal? What if other platforms need to get images in a disconnected setting?

This was specifically mentioned by @abhinavdahiya that they don't have an use case for other platforms. The idea of env var was a no-no as well, so it was suggested to use the platform config section to add a new entry there.

There's a global config for mirroring release image content under imageContentSources, could we put it there (or a similar field for machine OS content) so RHCOS content could be retrieved from a local mirror as well?

Can we combine the ideas of using imageContentSources as well as using the baremetal platform config section to override the bootstrap machine OS image? That way, we'll have the image mirrored locally and they using the config setting, point to that URL?

Or we just iterate through the imageContentSources list and look for the qemu entry. Ugly perhaps.

@hardys
Copy link
Contributor

hardys commented Nov 26, 2019

Can we combine the ideas of using imageContentSources as well as using the baremetal platform config section to override the bootstrap machine OS image? That way, we'll have the image mirrored locally and they using the config setting, point to that URL?

+1 on this idea, if we could optionally point to mirrored bootimages for both the bootstrap and cluster images that would be ideal, but this doesn't resolve the concern from @abhinavdahiya re exposing this only for the baremetal platform so I guess we need some further feedback on that.

@stbenjam
Copy link
Member Author

I chatted briefly with @abhinavdahiya about this, and it sounds like one possibility may be to add some install-config overrides to point to a locally cached image URL, but that these should be exposed only in the baremetal platform config section, not for all platforms.

Hm, why make it specific to baremetal? What if other platforms need to get images in a disconnected setting?

This was specifically mentioned by @abhinavdahiya that they don't have an use case for other platforms. The idea of env var was a no-no as well, so it was suggested to use the platform config section to add a new entry there.

I’m fine if it has to live in the platform but seems like the installer team is backing themselves into a corner. Local image mirrors seems useful for any current or future ipi platform that doesn’t have rhcos preloaded (openstack, libvirt, vsphere, ovirt).

@tomassedovic
Copy link
Contributor

tomassedovic commented Nov 27, 2019

I think the OpenStack platform could use this too.

We could implement the disconnected install by uploading the RHCOS image into the OpenStack image store (glance) as a prerequisite and instructing the installer to use it via the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE env var.

But I suspect that var was meant for dev only and should not be required in the disconnected case. If so, we would need an alternative way for getting the image as well.

@kirankt
Copy link
Contributor

kirankt commented Dec 2, 2019

I think the OpenStack platform could use this too.

We could implement the disconnected install by uploading the RHCOS image into the OpenStack image store (glance) as a prerequisite and instructing the installer to use it via the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE env var.

But I suspect that var was meant for dev only and should not be required in the disconnected case. If so, we would need an alternative way for getting the image as well.

@abhinavdahiya Since there seems to be interest in other platforms as well, can we come to an agreement about adding this feature widely, rather than just making it live in baremetal platform only? We can probably even do away with the lone env variable if we make this widely available. Thoughts?

@abhinavdahiya
Copy link
Contributor

The long term plan for installer to be support/control the boot-image for bootstrap-host only. And the boot-image for cluster nodes must be derived from the release-image.. see openshift/os#381

Therefore we support one ENV OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE with vague API that it controls the boot-image and doesn't make the distinction between the bootstrap and control-plane etc..

So therefore we cannot add the OPENSHIFT_INSTALL_BOOTSTRAP_OS_IMAGE_OVERRIDE or such.

We could implement the disconnected install by uploading the RHCOS image into the OpenStack image store (glance) as a prerequisite and instructing the installer to use it via the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE env var.

If we need to support the override in API, we should be more precise, if it's glance image, so the API should be that, so that we can verify the inputs.

Also I think supporting URL + glance image is kinda not great because the user can host the URL locally and the installer can continue to create the glance image...

Since there seems to be interest in other platforms as well, can we come to an agreement about adding this feature widely, rather than just making it live in baremetal platform only?

depends on what you are proposing?

iterate through the imageContentSources list and look for the qemu entry

i'm sorry that goes against the more desirable boot-image from release image and I don't think we should follow this path.

@stbenjam
Copy link
Member Author

stbenjam commented Dec 2, 2019

iterate through the imageContentSources list and look for the qemu entry

i'm sorry that goes against the more desirable boot-image from release image and I don't think we should follow this path.

@abhinavdahiya For our purposes, we are not intending to change which image is being used, but rather where it's obtained from. My understanding is in the OpenStack case, this would not be changing where in glance the image is obtained from, but rather where the image is obtained to upload to glance (i.e., baseURI in rhcos.json).

For a disconnected install, we can use oc adm release mirror to put the container images up on a mirror, and then use imageContentSources to override the path those are downloaded from during an install to point to somewhere locally.

There's no equivalent option for RHCOS images. We really need the ability to change the source from https://releases-art-rhcos.svc.ci.openshift.org to an HTTP server that's availably locally, so this idea of using imageContentSources seems (to me) to go along with the approach of coupling the image with a release, rather than offering an alternative approach.

@abhinavdahiya
Copy link
Contributor

so this idea of using imageContentSources seems (to me) to go along with the approach of coupling the image with a release, rather than offering an alternative approach.

we already have a way to pull container images in disconnected env and creating another mirror fetch for boot-image is not ideal, we know how to get container images from different mirror we don't need another parallel to pull random-files from mirrors too.

@stbenjam
Copy link
Member Author

stbenjam commented Dec 2, 2019

so this idea of using imageContentSources seems (to me) to go along with the approach of coupling the image with a release, rather than offering an alternative approach.
we already have a way to pull container images in disconnected env and creating another mirror fetch for boot-image is not ideal, we know how to get container images from different mirror we don't need another parallel to pull random-files from mirrors too.

Right, but disconnected container images isn't useful if you can't also get the RHCOS images in a disconnected way for platforms that need this (openstack, baremetal IPI at least)

We're not creating another thing, I'm proposing using the existing thing (imageContentSources) or something similar to ensure if there's a matching configuration for RHCOS, then we would obtain the RHCOS image from there instead.

Maybe something like this:

- source: https://releases-art-rhcos.svc.ci.openshift.org/art/storage/
  mirrors:
  - http://local-mirror.com/rhcos/

@abhinavdahiya
Copy link
Contributor

Right, but disconnected container images isn't useful if you can't also get the RHCOS images in a disconnected way for platforms that need this (openstack, baremetal IPI at least)

that's what i'm saying we need to work towards openshift/os#381

We're not creating another thing, I'm proposing using the existing thing (imageContentSources) or something similar to ensure if there's a matching configuration for RHCOS, then we would obtain the RHCOS image from there instead.

You are creating an new thing..

// ImageContentSources lists sources/repositories for the release-image content.
// +optional
ImageContentSources []ImageContentSource `json:"imageContentSources,omitempty"`

The goal of that field and work flow is to provide alternate source for container images..
and we want it to extent to alternate source for HTTP blobs..

disconnected fetching content in form of container images is already supported transparently and re-using that mechanism to get the RHCOS image in running cluster is a lot more desirable and overloading this one-off.

@cgwalters
Copy link
Member

Note that ship bootimages via the release image can only work after we do something like #1286 - since otherwise where is the image for the bootstrap going to come from?

Or alternatively, we could have a separate public bootimage just for bootstrap which could be revved less frequently, and pivot from it.

I'm just saying that until we land a lot of work, we're going to get continual demand for short-term band-aids, and it's not clear to me that skipping the band-aids is worth it.

@abhinavdahiya
Copy link
Contributor

Note that ship bootimages via the release image can only work after we do something like #1286 - since otherwise where is the image for the bootstrap going to come from?
Or alternatively, we could have a separate public bootimage just for bootstrap which could be revved less frequently, and pivot from it.

I think we are more concerned about the cluster nodes.. and why I don't think bootstrap-host needs to be handled similarly #2542 (comment)

I'm just saying that until we land a lot of work, we're going to get continual demand for short-term band-aids, and it's not clear to me that skipping the band-aids is worth it.

I'm not against for the time being fixes, but would like changes that are still useful in the desired future.

@stbenjam
Copy link
Member Author

stbenjam commented Dec 2, 2019

disconnected fetching content in form of container images is already supported transparently and re-using that mechanism to get the RHCOS image in running cluster is a lot more desirable and overloading this one-off.

What is the mechanism for disconnected fetching of container images if it's not imageContentSources? Maybe I'm missing some information.

Whether it's a new thing or not seems to be a semantic difference, imageContentSources exists, and is used to override where the installer fetches things from, I thought it could be modified for a similar purpose. Or maybe a totally new httpContentSources that works the same way is appropriate.

My main concern was that I don't think this should be embedded in the baremetal platform as asked earlier, as it's generally relevant to others.

If we just need a band aid short term, couldn't we take the original approach of an env var (#2595)? It a lot simpler and could be removed later.

@kirankt
Copy link
Contributor

kirankt commented Dec 4, 2019

disconnected fetching content in form of container images is already supported transparently and re-using that mechanism to get the RHCOS image in running cluster is a lot more desirable and overloading this one-off.

What is the mechanism for disconnected fetching of container images if it's not imageContentSources? Maybe I'm missing some information.

@abhinavdahiya When you mention transparent disconnected fetching of images, you're not referring to image caching are you?
e.g.

// If the file has already been cached, return its path

If that's the case, @stbenjam mentioned earlier that it causes complications with the Hive.

Whether it's a new thing or not seems to be a semantic difference, imageContentSources exists, and is used to override where the installer fetches things from, I thought it could be modified for a similar purpose. Or maybe a totally new httpContentSources that works the same way is appropriate.

My main concern was that I don't think this should be embedded in the baremetal platform as asked earlier, as it's generally relevant to others.

If we just need a band aid short term, couldn't we take the original approach of an env var (#2595)? It a lot simpler and could be removed later.

@stbenjam
Copy link
Member Author

stbenjam commented Dec 4, 2019

@abhinavdahiya Could we get some clarification about what approach would be acceptable to the installer team? We must have to have a way to override the location bootstrap RHCOS is downloaded from.

@stbenjam
Copy link
Member Author

stbenjam commented Dec 5, 2019

To summarize, I believe the 3 options we've proposed are:

  1. An environment variable for bootstrap, OPENSHIFT_INSTALL_BOOTSTRAP_OS_IMAGE_OVERRIDE

  2. Make use of imageContentSources or add an httpContentSources option, which will do URL matching and could override the baseURI used in rhcos.json.

  3. Override the bootstrap URL by adding an explicit configuration item in the baremetal platform config

Is that correct? Keep in mind, this will need to be backportable to 4.3. 1 and 3 are probably the least risky to backport. I'm not thrilled about 3, as being able to override HTTP sources is generally applicable to other platforms (e.g. disconnected openstack which downloads the images from the baseURI locations and pushes them to glance).

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Dec 5, 2019

3. Override the bootstrap URL by adding an explicit configuration item in the baremetal platform config

personally i think this is the only short-term solution if you want to support a way to be disconnected for boot-images. we can't have users set ENV for their actual installs.

Is that correct? Keep in mind, this will need to be backportable to 4.3. 1 and 3 are probably the least risky to backport. I'm not thrilled about 3, as being able to override HTTP sources is generally applicable to other platforms (e.g. disconnected openstack which downloads the images from the baseURI locations and pushes them to glance).

@hardys
Copy link
Contributor

hardys commented Jan 22, 2020

This was closed via #2757

@hardys
Copy link
Contributor

hardys commented Jan 22, 2020

/close

@openshift-ci-robot
Copy link
Contributor

@hardys: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/baremetal IPI bare metal hosts platform
Projects
None yet
Development

No branches or pull requests

7 participants