-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1806143: OpenStack: start using image import when possible #3162
Conversation
@Fedosin: This pull request references Bugzilla bug 1806143, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There are some CI issues, the code itself works well: |
/retest |
@@ -15,12 +15,6 @@ variable "openstack_base_image_name" { | |||
description = "Name of the base image to use for the nodes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need the openstack_base_image_name
variable it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to remove it too, but then I realized that we also need to support overridden image names (which are not created by the installer). It means that we have to to provide these names to Terraform somehow, and openstack_base_image_name
is the only way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, of course. I didn't notice it was still used on line 79 👍
pkg/tfvars/openstack/rhcos_image.go
Outdated
imageCreateOpts := images.CreateOpts{ | ||
Name: imageName, | ||
ContainerFormat: "bare", | ||
DiskFormat: "raw", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't DiskFormat
depends on the actual image type rather than hardcoding it to raw?
I find it weird that we need to set this when importing an image, because the whole point is to let glance decide what it the best image format for the backend...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be qcow2
here for sure. The image conversion plugin will change it if necessary upon image activation.
|
||
useImageImport, err := isImageImportSupported(cloud) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error checking whether glance supports image import should result is us assuming it does not rather than failing the image upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check what type of error Glance returns. If this is 404 because of the lack of the Discovery API, then isImageImportSupported
ignores the error and returns false, nil
. The function returns only real errors, for example, when Glance service is not available. In this case it's correct to stop the execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error could be anything, a 5xx because glance discovery API is broken, networking issue, ...
My point being, a failure in isImageImportSupported()
is recoverable and should not stop execution: we can assume the cloud does not support glance direct image import.
Failure of glance image discovery at time T doesn't imply failure of glance image upload at time T+1. We're catching error of image upload, aren't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this decision. It leads to unpredictable system behavior when, for example, due to a short network failure, the installer believes that the image import is unavailable, but it is not.
As for me, so if Glance returns an error, we should inform the user about it, and do not hide it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a higher-level concern on resilience, but I am unaware of that; and by default I am all for Mike's explicit error.
If I'm not mistaken, the rationale for using image-import is to enable the upload of images in cases where the legacy upload would fail.
Thinking of this user story (legacy method fails), the user would probably be surprised of seeing that the upload fails despite his cluster having image-upload capability.
I'd really fail here with an explicit error, and let them try again from scratch.
As an alternative, I'd consider removing the check and always uploading with image-import, and noisily falling back to the legacy upload in case of failure.
return err | ||
} | ||
|
||
if useImageImport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you find a cloud with glance direct image import to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on manila-env-like devstack with Glance and Ceph... This is not really cool, I know. We'll need to obtain a real OSP13 env, where we can test the solution properly.
9b67d7f
to
54889be
Compare
pkg/tfvars/openstack/rhcos_image.go
Outdated
} | ||
|
||
// Next two checks are just to make sure the response data was not corrupted | ||
if s.ImportMethods.Description != "Import methods available." { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the check just below this; so much so, that this check on "description" looks redundant. Moreover, relying on a strict check of a descriptive field looks brittle to me.
I'd remove the check on "Description", unless there's some additional reason for keeping it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there is no need to have this check. I added it because this is a constant string and it's a part of the API https://docs.openstack.org/api-ref/image/v2/?expanded=import-methods-and-values-discovery-detail#image-service-info-discovery
return true, nil | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about:
logrus.Debugln("Glance Direct image import plugin was not found")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/tfvars/openstack/rhcos_image.go
Outdated
} | ||
|
||
if useImageImport { | ||
logrus.Debugf("Using Image Import API to upload RHCOS to the image %v with ID %v", img.Name, img.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could use %q
instead of %v
, as they're both strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/approve |
oh Gophercloud's version bump requires an installer-approver |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pierreprinetti, sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Now we use Terraform to create images in Glance. This commit starts using direct Gophercloud calls for this.
54889be
to
0d92968
Compare
/lgtm @mandre /label platform/openstack |
/retest Please review the full test history for this PR and help us cut down flakes. |
@Fedosin: All pull requests linked via external trackers have merged. Bugzilla bug 1806143 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Fedosin: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-4.4 |
@Fedosin: #3162 failed to apply on top of branch "release-4.4":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Now we use legacy image uploading that doesn't support image conversion. It leads to the fact that we upload qcow2 images to Ceph backend, which doesn't support this format:
https://access.redhat.com/solutions/2434691
By using the Image Import mechanism we make sure that all uploaded images will be automatically converted to Raw for Ceph (and only for Ceph) backends.
If the image import mechanism is not available we fallback to the legacy uploading.