Packages upload to ghcr #208

Hind-M · 2022-10-25T10:27:34Z

Checklist

Used a personal fork of the feedstock to propose changes
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

conda-forge-linter · 2022-10-25T10:27:38Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

hmaarrfk · 2022-10-26T17:57:34Z

i'm mostly passing by, but can you share some context of what this is? is there an other issue open about this?

Hind-M · 2022-10-26T18:27:33Z

i'm mostly passing by, but can you share some context of what this is? is there an other issue open about this?

Hey! So this is related to the wanted feature to add packages upload to Github container registry in addition to anaconda.org.
This PR is also related and should probably be merged before.

beckermr

This upload cannot go here. We have to do it via the webservices in order to verify the artifacts.

Hind-M · 2022-11-09T14:24:17Z

This upload cannot go here. We have to do it via the webservices in order to verify the artifacts.

Oh ok! Where exactly do you suggest doing it? Is it somewhere here or somewhere else or in a another repo?
Thanks!

beckermr · 2022-11-09T14:30:12Z

Somewhere else completely. We'll need to do it either on the heroku server or using a dispatch to github actions.

cc @wolfv for viz

wolfv · 2022-11-10T12:45:42Z

Yeah, I think there are quite some considerations that we still have to do in terms of where to put this functionality.

Regarding the verification one could also do that via repodata (which is not automatically generated at this point). So the package could be uploaded to the OCI registry, but only added to the repodata after passing the validation step (and otherwise be removed again from the OCI registry). Just a thought.

It would be cool though to start to put together a standalone feedstock that does the upload-after-build to the OCI registry.

If we want to do the upload in the Heroku server, then this is probably the code (https://github.com/conda-forge/conda-forge-webservices/blob/ac84983eb66239c8d3bd6f5fb8b3297f709d2f8d/conda_forge_webservices/webapp.py#L498)

beckermr · 2022-11-10T12:54:19Z

So the heroku server can't itself do the upload. It'd grind to a halt. We'll need to dispatch out to another service. Or we need to stage into one OCI registry and copy to another via an api call.

wolfv · 2022-11-10T12:58:34Z

We can also use tags (e.g. 0.25.2_blalba_staging) and then just change the tag.

beckermr · 2022-11-10T13:25:56Z

As long as we don't ship repodata pointing to tags that'd be fine.

beckermr · 2022-11-10T13:32:38Z

Actually I'm not sure labels/tags will work. We shouldn't have keys to upload to our registry in feedstocks out in the open. We need a staging area and then a secured copy.

Hind-M · 2022-11-22T10:43:48Z

IIUC, we could upload to ghcr.io the same way it is done with anaconda.org - using a staging area and then copy to the prod, couldn't we?
If so, we could/should keep the upload in the upload_or_check_non_existence.py in this repo and add the copy part (and maybe additional missing stuff) from cf-staging to conda-forge in the webservices (webapp.py)?

beckermr · 2022-11-22T12:26:08Z

Yes, a staging area could work. However, remember that the copy from cf-staging to conda-forge on anaconda.org is a simple HTTP request made to anaconda.org once the package data has been validated. We never download and reupload packages. So to make the ghcr stuff work on our webservices instance, you'll need to find a similar HTTP API endpoint. A similar HTTP endpoint also needs to return the package hash for validation.

DerThorsten · 2022-11-30T10:39:23Z

I am trying to figure out what would be needed to move forward with the GitHub OCI upload:

Yes, a staging area could work. However, remember that the copy from cf-staging to conda-forge on anaconda.org is a simple HTTP request made to anaconda.org once the package data has been validated. We never download and reupload packages. So to make the ghcr stuff work on our webservices instance, you'll need to find a similar HTTP API endpoint. A similar HTTP endpoint also needs to return the package hash for validation.

I am relatively new in the world of OCI registries, so forgive me if I am confusing things :) but I tried to look into the specs to find such an API endpoint. The open-container spec mentions an endpoint which might be helpful to avoid a download-reupload

"If a necessary blob exists already in another repository within the same registry, it can be mounted into a different repository via a POST request [...]"
https://github.com/opencontainers/distribution-spec/blob/main/spec.md#mounting-a-blob-from-another-repository

beckermr · 2022-11-30T12:07:55Z

Sure that looks promising but I know nothing about OCI registries. I'll leave this to you and @wolfv to work out. Ideally, we could wrap the copy in the conda oci package @wolfv has going so it is easy to use.

We have some security requirements here related to tokens that I will share with @wolfv privately once the copy is working.

jaimergp · 2023-06-02T11:30:34Z

I've been thinking about this and doing some research. This is not a definitive assessment but a work in progress. I am not saying all the following is a good idea, but at least it takes us to the realm of what's feasible today.

The main concern right now is how to do staging in a safe way. conda-forge uses the cf-staging Anaconda.org channel where all feedstocks upload their artifacts. If the artifacts pass validation, a webservice copies them from cf-staging to conda-forge. Anaconda.org services will then index all conda-forge packages in the corresponding repodata.json.

Staging serves two purposes then:

Limiting access to the main channel
Avoiding early publication of a problematic artifact

How do we do this with OCI artifacts? The limitations are:

We only have one organization channel-mirrors so far, so feedstocks would get access to the "main" channel. This might not be as problematic as it sounds, but we need to ensure that's the case. ¹
Artifact metadata needs to be added before the upload, and I am not aware of a mechanism that allows metadata modification after the upload
Copying artifacts from one channel to other involves downloads, uploads and some API calls (definitely more expensive than the single COPY request to Anaconda.org) ²
We need to handle our own conda-index equivalent process with remote artifacts

So, all in all, I think that we can run everything off the channel-mirrors organization. We just need to devise a different staging mechanism. I suggest:

We mimic what Homebrew does and publish our own repodata using a similar approach. ³
Come up with a way to mark an artifact as ready for publication, after an upload. Annotations and labels seem to be pre-upload only, but maybe GH has a field we can use, like visibility or something. ⁴
Let feedstocks upload (only upload) to channel-mirrors and have the validation service run the needed checks on the new artifacts.
If it passes, the required metadata is modified accordingly, and the artifact will be published to the repodata in the next scheduled run.
If it doesn't, the required metadata won't be present, and the package will be deleted in the next scheduled run (different workflow than in step 4). Accidental deletions can still be recovered in the 30-day window.

Permission-wise, GH distinguishes between read, write and delete, which means that a properly scoped token used by feedstocks could maybe just write too many things, but in no way delete existing blobs. Note these tokens are NOT fine-grained.
There’s also a 30-day restore window if necessary. Deleted packages are available in the Settings UI.
Package overwriting shouldn't be possible (it would be a different hash anyway). The risks of a cross-feedstock publication are low as long as we have a validation process in-place. ↩
I read the OCI spec and apparently it supports the notion of “mounting blobs” from other registries. This means it could mimic the cf-staging to conda-forge setup in Anaconda.org. The Github Packages API doesn’t seem to support mounting though. There are also some issues online about it, and still open. ↩
See how homebrew does this with 15-min scheduled jobs; even the API is pre-generated JSON deployed to GH Pages in an environment. Their biggest payload is 20MB pure JSON though. These point to sha256 headers in GHCR.io. Search uses algolia too! ↩
See this for OCI annotations. I don't know if it can be added after an upload. What about tags? Can these be modified, added or removed? Right now tags encode the version and the build string. The UI does distinguish tagged vs untagged. ↩

Hind-M · 2023-06-19T12:15:45Z

Come up with a way to mark an artifact as ready for publication, after an upload. Annotations and labels seem to be pre-upload only, but maybe GH has a field we can use, like visibility or something.

I believe labels were superseded by annotations (https://github.com/opencontainers/image-spec/blob/main/annotations.md#back-compatibility-with-label-schema) and these cannot indeed be edited after building the artifact, but there is this interesting solution where we could add annotations to existing artifacts creating a separate ORAS Artifact Manifest referring to the original one, having the same digest, and being in the same repository.
I suppose that tags could also be a solution.
Visibility does exist apparently, see listing packages for an organization, and can be public, private or internal.
For the staging strategy, to be sure that I understood correctly, do you mean not using a staging area and distinguish the artifacts which are ready only with metadata/annotations?
When you say running everything off the channel-mirrors organization, where would it be? (within the organization of the corresponding GH repository we are packaging for example?).

jaimergp · 2023-06-29T15:35:05Z

but there is this interesting solution where we could add annotations to existing artifacts creating a separate ORAS Artifact Manifest referring to the original one, having the same digest, and being in the same repository.

That repo (johnsonshi/annotate-registry-artifacts) is indeed interesting. I am concerned about the permissions here, because in principle any feedstock could add the metadata bit to say "yea it is a valid artifact", unless we put that info somewhere else 🤔 Or maybe we need to check.

About visibility, I read a bit more into it and, while it could work, we must notice that:

Warning: Once you make a package public, you cannot make it private again.

So we would have to upload it as private, then run the validation and either publish as public or delete. I don't know if the amount of packages marked as "private" count towards some kind of quota but hopefully the number of artifacts that are marked as such at a given time is a small one.

For the staging strategy, to be sure that I understood correctly, do you mean not using a staging area and distinguish the artifacts which are ready only with metadata/annotations?

Correct, that's my proposal so far.

When you say running everything off the channel-mirrors organization, where would it be? (within the organization of the corresponding GH repository we are packaging for example?).

Maybe a repo like channel-mirrors/index or channel-mirrors/repodata. Maybe this can be published to the OCI registry too (instead of GH pages) but it needs to run on some sort of cronjob anyway and I am assuming the Homebrew folks decided for GH Pages for a good reason.

jaimergp · 2023-07-08T10:21:46Z

We discussed this approach in the monthly bot meeting and Matt raised a point I had not considered: the repodata.json schema doesn't allow external URLs for packages; it assumes that files will be co-located next to the repodata.json. So either:

we provide a thousand redirection endpoints in the GH Pages "channel" or...
we submit the necessary CEPs to adjust the repodata schema to (optionally) allow for external URLs, which would take precedence over the next-to-repodata assumption

jaimergp · 2023-07-26T09:49:33Z

@Hind-M and I met with @wolfv today and discussed potential alternatives:

Staging:

Instead of a single organization (channel-mirrors), we can add a second one; e.g. channel-mirrors-staging
Feedstocks upload to staging, and do not have access to production
A cronjob at channel-mirrors will periodically run validation checks on staging, and promote the valid packages to production. If they don't pass, they are deleted.

Repodata publication:

Only for packages in channel-mirrors
It can be served as an OCI artifact, or in GH Pages à la brew.
Need to add a plugin to conda to handle OCI-backed channels. This plugin will also be responsible of "figuring out" the OCI URL for each package in the repodata.json. We might just need an "endpoint" URL in the repodata header instead of per-artifact url.

Some other notes:

$GITHUB_TOKEN can't be used in GHA workflows with packages due to scale problems. Instead, one need to supply a packages:write PAT secret to the workflow. In conda-forge, this is best done via our token app or a bot account so the PAT is not tied to a personal account.
OCI operations (e.g. artifact download) are authenticated. The PAT is only needed to request the token, but the actual operation happens with a different one. This means that we can (theoretically) add more conditions to the per-operation token minting and further reduce the scope to a single package name or stuff like that.

isuruf · 2023-10-16T19:48:49Z

Instead of a single organization (channel-mirrors), we can add a second one; e.g. channel-mirrors-staging
A cronjob at channel-mirrors will periodically run validation checks on staging, and promote the valid packages to production. If they don't pass, they are deleted.

I'm not sure what the difference with this approach and using a cronjob at channel-mirros to download from anaconda.org and push to channel-mirrors org directly.

Hind-M · 2024-07-30T11:27:15Z

Instead of a single organization (channel-mirrors), we can add a second one; e.g. channel-mirrors-staging
A cronjob at channel-mirrors will periodically run validation checks on staging, and promote the valid packages to production. If they don't pass, they are deleted.

I'm not sure what the difference with this approach and using a cronjob at channel-mirros to download from anaconda.org and push to channel-mirrors org directly.

Because we want to do it independently from anaconda.org.

Upload packages to ghcr in addition to anaconda.org

6304c1d

Hind-M force-pushed the add_pkgs_to_ghcr branch from f94d594 to 6304c1d Compare October 26, 2022 17:15

Hind-M mentioned this pull request Oct 26, 2022

Install conda_oci_mirror to use in conda-forge-ci-setup-feedstock conda-forge/conda-smithy#1683

Open

beckermr requested changes Nov 7, 2022

View reviewed changes

jaimergp mentioned this pull request Apr 28, 2023

Run a scalable mirror of conda-forge on GitHub packages. Quansight-Labs/czi-conda-forge-mgmt#23

Open

jaimergp mentioned this pull request Jan 29, 2025

support uploading to prefix.dev conda-forge/conda-forge.github.io#2434

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packages upload to ghcr #208

Packages upload to ghcr #208

Hind-M commented Oct 25, 2022

conda-forge-linter commented Oct 25, 2022

hmaarrfk commented Oct 26, 2022

Hind-M commented Oct 26, 2022 •

edited

Loading

beckermr left a comment

Hind-M commented Nov 9, 2022

beckermr commented Nov 9, 2022

wolfv commented Nov 10, 2022

beckermr commented Nov 10, 2022

wolfv commented Nov 10, 2022

beckermr commented Nov 10, 2022

beckermr commented Nov 10, 2022

Hind-M commented Nov 22, 2022

beckermr commented Nov 22, 2022 •

edited

Loading

DerThorsten commented Nov 30, 2022

beckermr commented Nov 30, 2022

jaimergp commented Jun 2, 2023

Hind-M commented Jun 19, 2023

jaimergp commented Jun 29, 2023

jaimergp commented Jul 8, 2023

jaimergp commented Jul 26, 2023

isuruf commented Oct 16, 2023

Hind-M commented Jul 30, 2024

Packages upload to ghcr #208

Are you sure you want to change the base?

Packages upload to ghcr #208

Conversation

Hind-M commented Oct 25, 2022

conda-forge-linter commented Oct 25, 2022

hmaarrfk commented Oct 26, 2022

Hind-M commented Oct 26, 2022 • edited Loading

beckermr left a comment

Choose a reason for hiding this comment

Hind-M commented Nov 9, 2022

beckermr commented Nov 9, 2022

wolfv commented Nov 10, 2022

beckermr commented Nov 10, 2022

wolfv commented Nov 10, 2022

beckermr commented Nov 10, 2022

beckermr commented Nov 10, 2022

Hind-M commented Nov 22, 2022

beckermr commented Nov 22, 2022 • edited Loading

DerThorsten commented Nov 30, 2022

beckermr commented Nov 30, 2022

jaimergp commented Jun 2, 2023

Footnotes

Hind-M commented Jun 19, 2023

jaimergp commented Jun 29, 2023

jaimergp commented Jul 8, 2023

jaimergp commented Jul 26, 2023

isuruf commented Oct 16, 2023

Hind-M commented Jul 30, 2024

Hind-M commented Oct 26, 2022 •

edited

Loading

beckermr commented Nov 22, 2022 •

edited

Loading