Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feature): add direct image registry client Unpacker implementation #145

Merged

Conversation

everettraven
Copy link
Collaborator

@everettraven everettraven commented Aug 25, 2023

Description

  • Replaces the pod based Unpacker implementations with one that communicates directly with an image registry
  • Adds unit tests for the image registry Unpacker implementation
  • Updates e2e tests to push test images to an image registry running on the e2e cluster
  • Adds permissions to get Secrets in the catalogd-system namespace. This is so we can pull images that require authentication by using a pull secret

Motivation

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 25, 2023
@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Attention: 67 lines in your changes are missing coverage. Please review.

Comparison is base (8a90d8d) 87.02% compared to head (9fea4bc) 49.72%.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #145       +/-   ##
===========================================
- Coverage   87.02%   49.72%   -37.30%     
===========================================
  Files           3        6        +3     
  Lines         131      366      +235     
===========================================
+ Hits          114      182       +68     
- Misses         10      163      +153     
- Partials        7       21       +14     
Files Coverage Δ
pkg/controllers/core/catalog_controller.go 90.19% <20.00%> (-3.62%) ⬇️
internal/source/unpacker.go 0.00% <0.00%> (ø)
cmd/manager/main.go 9.92% <29.72%> (ø)
internal/source/image_registry_client.go 66.66% <66.66%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2023
@everettraven everettraven marked this pull request as ready for review September 7, 2023 20:36
@everettraven everettraven requested a review from a team as a code owner September 7, 2023 20:36
@everettraven everettraven changed the title wip: use image registry client for image unpacking instead of pods (feature): add feature-gated image registry client Unpacker implementation Sep 7, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 7, 2023
// TODO: Add garbage collection to remove any unused
// images/SHAs that exist in the cache

// TODO: Make asynchronous
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/e2e/unpack_test.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
Copy link
Member

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have one (yet) but I think it's useful to consider writing code for controllers as if there were a long-lived deployment you are on the pager for. Every commit that merges to the codebase deploys, and you need to keep in mind that branch cuts for future releases can happen at any point.

The implication here is that if we were to cut a release of catalogd with this PR, the server would cache image layers forever and not do any cleanup. Prefer when adding new features to a controller to handle (minimally) the full lifecycle.

There are a couple layers of indirection in the codebase so I don't know how the basic intuition translates to this case, but writing the full lifecycle code in a controller context is fairly straightforward.

config/rbac/role.yaml Show resolved Hide resolved
config/rbac/role_binding.yaml Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/image_registry_client.go Show resolved Hide resolved
internal/source/image_registry_client.go Outdated Show resolved Hide resolved
internal/source/unpacker.go Outdated Show resolved Hide resolved
@everettraven
Copy link
Collaborator Author

We don't have one (yet) but I think it's useful to consider writing code for controllers as if there were a long-lived deployment you are on the pager for. Every commit that merges to the codebase deploys, and you need to keep in mind that branch cuts for future releases can happen at any point.

The implication here is that if we were to cut a release of catalogd with this PR, the server would cache image layers forever and not do any cleanup. Prefer when adding new features to a controller to handle (minimally) the full lifecycle.

Maybe I'm misunderstanding, but I thought that is what feature gates are used for? As far as I understand it, the feature gate should signal that a feature is minimally functional (IMO this is - it does what it needs to even if not optimized) but is still a work in progress, is bound to change, and could need optimizations. This feature gate should be disabled by default and therefore this implication shouldn't happen unless the feature gate is explicitly enabled.

If I am misunderstanding I am happy to add the garbage collection logic. My intention here was to simply get the basic implementation in and then iterate from there.

There are a couple layers of indirection in the codebase so I don't know how the basic intuition translates to this case, but writing the full lifecycle code in a controller context is fairly straightforward.

I don't think it would be too complex to do a minimal implementation of garbage collection. As I mentioned above, I don't mind adding it if I misunderstood the message sent by sticking this functionality behind a feature gate and that we could get these optimization type things in iteratively.

@joelanford
Copy link
Member

joelanford commented Sep 8, 2023

IMO, it is reasonable to deliver half-baked features behind a disabled-by-default feature gate. It would be good to have a design that enumerates the plan for how the feature is delivered and what the proposed maturity process is for the feature gate.

Is there a brief or RFC for this? Seems big enough to warrant one.

@stevekuznetsov
Copy link
Member

I'm less interested in Google Docs than making the development easier. Since lifecycle is the bread and butter of what controllers do, it's usually useful to tackle it (minimally) whole-hog so that you don't make more refactor work for yourself in the future on code you just wrote.

@joelanford
Copy link
Member

I don't disagree. I'd guess that we may not all agree what our definitions of "minimally" are though. That's where a high-level plan could help achieve alignment on expectations.

@everettraven
Copy link
Collaborator Author

Is there a brief or RFC for this? Seems big enough to warrant one.

There is not

Copy link
Collaborator

@anik120 anik120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the RFC.

config/rbac/role.yaml Outdated Show resolved Hide resolved
config/default/manager_auth_proxy_patch.yaml Outdated Show resolved Hide resolved
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 8, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 8, 2023
pkg/errors/unrecoverable.go Outdated Show resolved Hide resolved
pkg/errors/unrecoverable.go Outdated Show resolved Hide resolved
test/tools/imageregistry/imgreg.yaml Outdated Show resolved Hide resolved
test/tools/imageregistry/imgreg.yaml Outdated Show resolved Hide resolved
test/tools/imageregistry/registry.sh Show resolved Hide resolved
Comment on lines +71 to +74
image-registry: ## Setup in-cluster image registry
./test/tools/imageregistry/registry.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen https://github.com/tilt-dev/ctlptl?

Since we're already tilt-enabled, perhaps this is a reasonable tool to evaluate for our cluster setup purposes, especially since it seems to support setting up image registries that are
simultaneously accessible to the local dev environment and the in-cluster processes.

Can we spend 30 minutes now (but time boxed) evaluating this to see if it would be a drop in replacement?

If it is a good candidate, it would be fine to integrate in a follow-up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen that. I played around with doing it the kind way and wasn't able to get that working well (there seemed to be resolution issues from the direct client and it couldn't find the registry when using both the names mentioned in the docs). My understanding is that ctlptl pretty much does the steps mentioned in the kind docs for you and would, in theory, have the same results so i didn't give it a shot.

That being said, I'm happy to time box an evaluation of it. I'll circle back around to investigating this after addressing other things since this would be a follow-up anyways

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a shot and ran into the same issues I had with the approach documented by kind. It just can't seem to resolve the registry URL. Maybe this is just an issue on my machine though? I haven't tested it on my Mac, but I've had kind related troubles in the past on my work laptop running Fedora so 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good. Maybe make a tracker for this, even if it's just a way to mentally bookmark this apparent gap in the ecosystem for cluster+registry tooling. It seems like progress is being made, so perhaps we can revisit again in 3-6 months.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/manager/main.go Outdated Show resolved Hide resolved
internal/source/unpacker.go Show resolved Hide resolved
test/tools/imageregistry/imagebuilder.yaml Outdated Show resolved Hide resolved
Comment on lines 17 to 21
metadata:
name: my-selfsigned-ca
namespace: catalogd-e2e
spec:
isCA: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. Is this a CA or is this the docker registry's server certificate, or both?

I think what we need is just a self-signed server keypair, and then clients just need the server cert to trust. But maybe I'm missing something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is both. We use the tls.crt and tls.key entries in the generated secret to be used by the image registry and we inject the ca.crt entry to the necessary locations so that clients trust the cert used by the image registry. I tried it some different ways to no avail, but it could just be due to my lack of experience with cert-manager (and certificate management in general)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving as is for now. This should be addressed by #188

test/tools/imageregistry/imgreg.yaml Outdated Show resolved Hide resolved
spec:
containers:
- name: registry
image: registry:2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from Docker Hub, which is rate-limited, and can cause flakes. Can we find a registry container in quay (or some other better image registry)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to use Project Quay as the image registry but that would require a good bit more effort to investigate how to properly configure this. This image is Apache 2.0 licensed so if we are concerned about the flakes we could probably pull from dockerhub, re-tag, and push to a quay repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is writing our own simple registry in Go and building+loading that image onto the kind cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think registry:2 is fine for now. I'm not seeing it obviously mirrored into another registry. Perhaps a follow-up issue and/or comment in the YAML about this for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Just to try it out, I testing what it would take to write a super barebones registry in Go. Was able to do it with:

package main

import (
	"log"
	"net/http"
	"os"

	"github.com/google/go-containerregistry/pkg/registry"
)

func main() {
	certFile, certEnvVarExists := os.LookupEnv("REGISTRY_HTTP_TLS_CERTIFICATE")
	if !certEnvVarExists {
		if err := http.ListenAndServe(":5000", registry.New()); err != nil {
			log.Fatal(err)
		}
	}

	keyFile, keyEnvVarExists := os.LookupEnv("REGISTRY_HTTP_TLS_KEY")
	if !keyEnvVarExists {
		log.Fatal("environment variable REGISTRY_HTTP_TLS_CERTIFICATE specified but REGISTRY_HTTP_TLS_KEY was not")
	}
	if err := http.ListenAndServeTLS(":5000", certFile, keyFile, registry.New(registry.Logger(log.Default()))); err != nil {
		log.Fatal(err)
	}
}

I built and loaded an image with this for e2e and it passed when run locally. I won't push this up because it was a quick poc, but something I thought would be worth sharing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up issue: #189

Makefile Outdated Show resolved Hide resolved
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@@ -0,0 +1,97 @@
package main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this really matters, but I'd suggest (for a follow-up) moving the GC function into a separate package so that we can avoid a unit test in the main package.

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments about potential follow-ups. But LGTM.

@fgiloux
Copy link

fgiloux commented Oct 9, 2023

It seems that the selected library does not support mirroring. What is the plan in this regard?

@joelanford
Copy link
Member

It seems that the selected library does not support mirroring. What is the plan in this regard?

This is a known (and out-of-scope for now) feature that needs to be implemented. The RFC calls this out and says that we'll cover it in a future RFC.

AuthNamespace string
}

const ConfigDirLabel = "operators.operatorframework.io.index.configs.v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably switch to using ConfigsLocationLabel from here without creating a new constant variable

return fmt.Errorf("error parsing remote image %q config file: %w", imgRef.Name(), err)
}

dirToUnpack, ok := cfgFile.Config.Labels[ConfigDirLabel]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is applicable here

@ncdc ncdc added this to the v0.8.0 milestone Oct 10, 2023
Copy link
Collaborator

@anik120 anik120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The most important follow up to capture for me would be the reassessing of the two interfaces, Unpack and Store, and the use of a cache in Unpack in that context.

containers:
- name: manager
volumeMounts:
- mountPath: /etc/ssl/certs/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have this context captured as a comment here somewhere somehow (I'm not sure what the mechanics for using comments in a kustomization files are). Issues aren't available to a reader of the code, comments are, even if the comment is just pointing to the GitHub issue.

@everettraven everettraven added this pull request to the merge queue Oct 10, 2023
Merged via the queue into operator-framework:main with commit 9f3ba06 Oct 10, 2023
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment