Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest list resolution breaks CRI-O based clusters #3997

Closed
markusthoemmes opened this issue May 3, 2019 · 15 comments · Fixed by #8192
Closed

Manifest list resolution breaks CRI-O based clusters #3997

markusthoemmes opened this issue May 3, 2019 · 15 comments · Fixed by #8192
Assignees
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.

Comments

@markusthoemmes
Copy link
Contributor

HEAD

Expected Behavior

I expect all images to work on Knative.

Actual Behavior

Images from a registry that supports multiple architectures (like Dockerhub.com) don't work.

Steps to Reproduce the Problem

  1. Get a CRI-O based cluster (like Openshift)
  2. Run TestCmdArgsService (uses the python images)
  3. It doesn't work.

Relevant information

This is related to cri-o/cri-o#2157. According to @jonjohnsonjr we only recently switched to resolving the manifest list vs. only resolving amd64/linux.

@markusthoemmes markusthoemmes added the kind/bug Categorizes issue or PR as related to a bug. label May 3, 2019
@markusthoemmes
Copy link
Contributor Author

@jonjohnsonjr would it be a workable solution to provide a know to set the architecture to resolve (the default being to resolve the manifest list and letting the node handle the final resolution)?

I think that'd unblock CRI-O based clusters for now (we can almost certainly safely assume amd64/linux there) until that bug in CRI-O is fixed.

@jonjohnsonjr
Copy link
Contributor

We can use runtime.GOOS and runtime.GOARCH to pull the image that matches the controller's system. This won't work for clusters that have heterogeneous nodes, e.g. a master on amd64/linux scheduling jobs on a bunch of raspberry pis. I don't see a better option until CRI-O is fixed, though.

@markusthoemmes
Copy link
Contributor Author

@jonjohnsonjr what if we put a knob in front of that as proposed above?

@jonjohnsonjr
Copy link
Contributor

What kind of knob are we thinking? Configmap? Environment variables?

@jonjohnsonjr
Copy link
Contributor

I think #3998 is going to work in almost every case, so I'm reluctant to add a knob if we'll never need to use it.

@jonjohnsonjr
Copy link
Contributor

This would also break schema 1 images, but I'm okay with that.

jonjohnsonjr added a commit to jonjohnsonjr/serving that referenced this issue Jun 19, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
jonjohnsonjr added a commit to jonjohnsonjr/serving that referenced this issue Jun 21, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
jonjohnsonjr added a commit to jonjohnsonjr/serving that referenced this issue Jun 24, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
jonjohnsonjr added a commit to jonjohnsonjr/serving that referenced this issue Jun 24, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
jonjohnsonjr added a commit to jonjohnsonjr/serving that referenced this issue Jun 25, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
knative-prow-robot pushed a commit that referenced this issue Jun 25, 2019
Fixes #4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: #3997

For everything else, we'll just use the digest we get back from the
initial request.
hohaichi pushed a commit to hohaichi/serving that referenced this issue Jun 25, 2019
Fixes knative#4155

For fat manifests, we'll fetch an image by platform because CRI-O is
broken: knative#3997

For everything else, we'll just use the digest we get back from the
initial request.
@mattmoor mattmoor added the area/API API objects and controllers label Jun 26, 2019
@mattmoor mattmoor added this to the Ice Box milestone Jun 26, 2019
@markusthoemmes
Copy link
Contributor Author

@jonjohnsonjr I lost track about this kinda. Is this bug still worth tracking? IIRC you've done some other changes to do support schema v1 in the meantime?

@jonjohnsonjr
Copy link
Contributor

This tracks a TODO that I'd like to keep it open until CRI-O gets fixed so that it doesn't get forgotten about. The schema 1 stuff is only kind of related.

Right now we have to choose 2 of these:

  1. Knative does tag resolution
  2. Knative works on multi-platform clusters
  3. Knative works on CRI-O

With current defaults, we have chosen 1 and 3. If you want to use knative with a multi-platform cluster, you'll have to stop doing tag resolution, which sucks :/

We don't have a "I'm not using CRI-O so I don't care about cri-o/cri-o#2157" flag, so this currently affects everyone.

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2019
@knative-housekeeping-robot

Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

@knative-prow-robot knative-prow-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2020
@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot
Copy link
Contributor

@knative-housekeeping-robot: Closing this issue.

In response to this:

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@julz
Copy link
Member

julz commented May 18, 2020

/reopen
/remove-lifecycle rotten

Cri-o now supports this so we should be able to fix the todo.

@knative-prow-robot
Copy link
Contributor

@julz: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Cri-o now supports this so we should be able to fix the todo.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 18, 2020
@markusthoemmes
Copy link
Contributor Author

I can have a look at our supported Openshift versions to see if this would now work as expected.

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
7 participants