Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define an azure schema #13

Closed
cgwalters opened this issue Jan 22, 2021 · 13 comments · Fixed by #16
Closed

Define an azure schema #13

cgwalters opened this issue Jan 22, 2021 · 13 comments · Fixed by #16

Comments

@cgwalters
Copy link
Member

FCOS isn't uploading to Azure right now, but RHCOS is. We need to define a stream schema for this.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 25, 2021

There's a lot of history here; important PRs seem to be:
coreos/coreos-assembler#620
openshift/installer#1976

A big intersection point here is using public images. I think we'd like to do so for OpenShift 4 in general. Today FCOS is in GCP more "officially"; there's a subthread on that e.g. coreos/fedora-coreos-tracker#147 (comment)
It's actually in the shortlist for the "Public Image" dropdown, which is cool.
We are not yet doing that for RHCOS.

FCOS is not in Azure in any official way yet; that's coreos/fedora-coreos-tracker#148

So...my first instinct here to start is that we go with (in YAML since it's easier to type):

architectures:
  x86_64:
    azure:
      url: https://rhcos.blob.core.windows.net/imagebucket/rhcos-47.83.202012030221-0-azure.x86_64.vhd

i.e. basically just the same as what ended up in the cosa meta.json. If at some point we end up in the Marketplace, we can adjust.

However...what I'm still trying to figure out is if there's an "AMI equivalent" for Azure - basically a way to offer a public thing that can be directly launched, and doesn't need to be copied into the target "resource group" which is what at least openshift-install is doing today with Terraform.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 25, 2021

Hmm actually based on this doc it looks to me like we can just directly create a VM from a VHD in a storage bucket...so why is the installer copying it? OK the commit says:

The public vhd is copied to cluster's resource group to make sure the image is not based directly on a public artifact.

But what would be the problem with it being a "public artifact"?

@lucab
Copy link

lucab commented Jan 25, 2021

However...what I'm still trying to figure out is if there's an "AMI equivalent" for Azure - basically a way to offer a public thing that can be directly launched, and doesn't need to be copied into the target "resource group" which is what at least openshift-install is doing today with Terraform.

Looking at CL docs, apparently there is a format for references to public images, which looks like CoreOS:CoreOS:Stable:latest.

@crawford
Copy link

@lucab that URI is only usable by images in the Marketplace, as I recall.

@cgwalters we're copying the managed images because of performance limitations. This doc mentions that "one managed image supports up to 20 simultaneous deployments".

@cgwalters
Copy link
Member Author

@crawford Should we pursue Marketplace for RHCOS, or just live with the status quo?

There also seems to be a middle ground using shared image galleries but it's not clear to me it supports "publishing" publicly. I found this doc about image gallieries across tenants but that's for just a few, not 1 publisher to many consumers as we want.

@cgwalters
Copy link
Member Author

(Hmm I guess the installer could internally create an image gallery at least, which would better handle geo-replication and garbage collection)

@crawford
Copy link

We want to stay away from the Marketplace. That's designed to sell VMs directly to customers, but since RHCOS isn't useful outside of OpenShift, it sets up folks to get confused and frustrated when they try to launch RHCOS VMs directly.

Remember, long term, we want OpenShift managing the VM images. It's probably not worth the effort to migrate from the existing implementation to image galleries, only to migrate again to our eventual pattern. We can revise that stance if we find that customers are running into these performance limits.

cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Jan 25, 2021
This contains just a URL for now. It's what RHCOS does
today and will likely be doing so for the near future.
For FCOS we may end up doing something similar too.

Closes: coreos#13
@cgwalters
Copy link
Member Author

PR in #14

cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Jan 25, 2021
cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Jan 25, 2021
This contains just a URL for now. It's what RHCOS does
today and will likely be doing so for the near future.
For FCOS we may end up doing something similar too.

Closes: coreos#13
cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Jan 26, 2021
@cgwalters
Copy link
Member Author

Remember, long term, we want OpenShift managing the VM images. It's probably not worth the effort to migrate from the existing implementation to image galleries, only to migrate again to our eventual pattern. We can revise that stance if we find that customers are running into these performance limits.

But as long as we have a bootstrap node that is RHCOS, we can't get away from this problem. Now there are things we could do like e.g. support doing the bootstrap from a traditional RHEL system too; would require sticking crio/kubelet in a container that we pull and extract, rework the installer to use cloud-init etc...messy.

@crawford
Copy link

What is the problem you're alluding to?

@cgwalters
Copy link
Member Author

cgwalters commented Jan 26, 2021

Every run of openshift-install today creates a bootstrap node - that has to be booted from an image. If we're booting a public image (though not necessarily publicized) it just works better in public clouds because it's all generally mirrored and replicated automatically. To rephrase my above comment, the idea is that if we got away from the "bootstrap must be the exact RHCOS version" model towards e.g. supporting running the bootstrap flow in a privileged podman container it would break the bootstrap ⇔ RHCOS cycle.

Using public images is pressure in a few areas; e.g. GCP at some point recently greatly slowed down the rate at which one could create custom images, which broke our CI which spawns clusters constantly.
See...at least openshift/installer#3808 and coreos/coreos-assembler#1610 . There's a GCP doc or blog entry on this that Abhinav probably knows.

On the other hand, using public images by default in openshift-install increases the delta WRT private cloud/onprem which is really the larger story around the enhancement.

Dunno.

I guess for now, one thing we could do is to create a separate "RHCOS special" section in stream metadata and stick this azure info there? IOW again in YAML form:

architectures:
  x86_64:
    rhcos-4.8-private:
      azure:
        url: https://rhcos.blob.core.windows.net/imagebucket/rhcos-47.83.202012030221-0-azure.x86_64.vhd

The idea here is that rhcos-4.8-private would be specific to OpenShift 4.8 and would only be used by openshift-install as well as coreos-assembler (for single OS CI runs) and wouldn't appear in the FCOS metadata.

@crawford
Copy link

I'm having trouble following your line of reasoning. Let me try explaining your argument, as I understand it.

You're looking for an identifier that we can use to record the bootimage for Azure (e.g. blob URL, image URI). Digging deeper, you noticed that the image we use (located by the blob URL) is being copied into the target Resource Group and you'd prefer that we instead make use of a "public image" (something akin to AWS public AMIs) in order to avoid running into cloud-specific, image-creation restrictions.

I'm with you up to this point, but I don't understand why this problem is specific to the bootstrap node. I would expect all of our nodes to start from the same bootimage. Since we need to replicate the image for our control and compute nodes, why wouldn't we just also use the same image for the bootstrap?

@cgwalters
Copy link
Member Author

You're absolutely right.

What I am arguing here is somewhat contradictory and I think the root problem is basically that we're being pulled in very directions by the efficiency gains of public images in public cloud versus needing to handle onprem/metal. And the other tension is "RHCOS is part of OpenShift only" versus "FCOS is general". And well another tension point is that what we're doing today for RHCOS on some clouds (AWS) work fundamentally differently than others (Azure). But at this point I'm just re-re-capping the thread again so I'll stop 😄

So ummm...I think my proposal is we just stick an "extension section" in the stream metadata for now and use it for RHCOS. Now maybe we could do the same with FCOS - perhaps we should at least to avoid ingress/egress costs for us and users. And if we go that route, perhaps it isn't an extension section?

cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Feb 9, 2021
First this changes ore to write a JSON file that the Python side
parses, so we don't need to re-synthesize the URL on the Python
side.

Gather Azure Blob metadata (size+md5) and add that to our metadata,
This is mainly useful so that one can use the Azure API to "offline validate"
against our metadata snapshot.  I'd also like to add our sha256
checksum for consistency, but that can come later.

This is prep for coreos/stream-metadata-go#13
cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Feb 11, 2021
This contains just a URL for now because that's all
the current RHCOS cosa metadata has.  I'm trying
to add e.g. the Azure Blob storage md5 information as
well as the full size+sha256, but in practice this
data is just the uncompressed VHD; anyone who wants
to do "offline" verification outside of Azure can
replicate that.

For FCOS we may end up uploading the image too,
though we hope there to end up in the Marketplace.

For now, let's stick this off an explicit extension area.

Closes: coreos#13
cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Feb 11, 2021
This contains just a URL for now because that's all
the current RHCOS cosa metadata has.  I'm trying
to add e.g. the Azure Blob storage md5 information as
well as the full size+sha256, but in practice this
data is just the uncompressed VHD; anyone who wants
to do "offline" verification outside of Azure can
replicate that.

For FCOS we may end up uploading the image too,
though we hope there to end up in the Marketplace.

For now, let's stick this off an explicit extension area.

Closes: coreos#13
cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Feb 11, 2021
This contains just a URL for now because that's all
the current RHCOS cosa metadata has.  I'm trying
to add e.g. the Azure Blob storage md5 information as
well as the full size+sha256, but in practice this
data is just the uncompressed VHD; anyone who wants
to do "offline" verification outside of Azure can
replicate that.

For FCOS we may end up uploading the image too,
though we hope there to end up in the Marketplace.

For now, let's stick this off an explicit extension area.

Closes: coreos#13
cgwalters added a commit to cgwalters/stream-metadata-go that referenced this issue Feb 16, 2021
This contains just a URL for now because that's all
the current RHCOS cosa metadata has.  I'm trying
to add e.g. the Azure Blob storage md5 information as
well as the full size+sha256, but in practice this
data is just the uncompressed VHD; anyone who wants
to do "offline" verification outside of Azure can
replicate that.

For FCOS we may end up uploading the image too,
though we hope there to end up in the Marketplace.

For now, let's stick this off an explicit extension area.

Closes: coreos#13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants