Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1806143: OpenStack: start using image import when possible #3162

Merged
merged 4 commits into from
Feb 27, 2020

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Feb 22, 2020

Now we use legacy image uploading that doesn't support image conversion. It leads to the fact that we upload qcow2 images to Ceph backend, which doesn't support this format:
https://access.redhat.com/solutions/2434691

By using the Image Import mechanism we make sure that all uploaded images will be automatically converted to Raw for Ceph (and only for Ceph) backends.

If the image import mechanism is not available we fallback to the legacy uploading.

@openshift-ci-robot
Copy link
Contributor

@Fedosin: This pull request references Bugzilla bug 1806143, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1806143: OpenStack: start using image import when possible

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 22, 2020
@Fedosin
Copy link
Contributor Author

Fedosin commented Feb 22, 2020

There are some CI issues, the code itself works well:
level=debug msg="Creating a Glance image for RHCOS..."
level=debug msg="Image 5y1xwbnp-1bfcb-tdvkm-rhcos was created."
level=debug msg="Checking if the image import mechanism is supported"
level=debug msg="Using legacy API to upload RHCOS to the image 5y1xwbnp-1bfcb-tdvkm-rhcos with ID fcc8d3d5-6294-4bf4-bc89-e4f3656f2994"
level=debug msg="The data was uploaded."

@Fedosin
Copy link
Contributor Author

Fedosin commented Feb 22, 2020

/retest

@@ -15,12 +15,6 @@ variable "openstack_base_image_name" {
description = "Name of the base image to use for the nodes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need the openstack_base_image_name variable it seems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to remove it too, but then I realized that we also need to support overridden image names (which are not created by the installer). It means that we have to to provide these names to Terraform somehow, and openstack_base_image_name is the only way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, of course. I didn't notice it was still used on line 79 👍

imageCreateOpts := images.CreateOpts{
Name: imageName,
ContainerFormat: "bare",
DiskFormat: "raw",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't DiskFormat depends on the actual image type rather than hardcoding it to raw?

I find it weird that we need to set this when importing an image, because the whole point is to let glance decide what it the best image format for the backend...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be qcow2 here for sure. The image conversion plugin will change it if necessary upon image activation.


useImageImport, err := isImageImportSupported(cloud)
if err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error checking whether glance supports image import should result is us assuming it does not rather than failing the image upload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check what type of error Glance returns. If this is 404 because of the lack of the Discovery API, then isImageImportSupported ignores the error and returns false, nil. The function returns only real errors, for example, when Glance service is not available. In this case it's correct to stop the execution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error could be anything, a 5xx because glance discovery API is broken, networking issue, ...
My point being, a failure in isImageImportSupported() is recoverable and should not stop execution: we can assume the cloud does not support glance direct image import.

Failure of glance image discovery at time T doesn't imply failure of glance image upload at time T+1. We're catching error of image upload, aren't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this decision. It leads to unpredictable system behavior when, for example, due to a short network failure, the installer believes that the image import is unavailable, but it is not.
As for me, so if Glance returns an error, we should inform the user about it, and do not hide it.

Copy link
Member

@pierreprinetti pierreprinetti Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a higher-level concern on resilience, but I am unaware of that; and by default I am all for Mike's explicit error.

If I'm not mistaken, the rationale for using image-import is to enable the upload of images in cases where the legacy upload would fail.

Thinking of this user story (legacy method fails), the user would probably be surprised of seeing that the upload fails despite his cluster having image-upload capability.

I'd really fail here with an explicit error, and let them try again from scratch.


As an alternative, I'd consider removing the check and always uploading with image-import, and noisily falling back to the legacy upload in case of failure.

return err
}

if useImageImport {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you find a cloud with glance direct image import to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on manila-env-like devstack with Glance and Ceph... This is not really cool, I know. We'll need to obtain a real OSP13 env, where we can test the solution properly.

pkg/tfvars/openstack/rhcos_image.go Show resolved Hide resolved
}

// Next two checks are just to make sure the response data was not corrupted
if s.ImportMethods.Description != "Import methods available." {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the check just below this; so much so, that this check on "description" looks redundant. Moreover, relying on a strict check of a descriptive field looks brittle to me.

I'd remove the check on "Description", unless there's some additional reason for keeping it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there is no need to have this check. I added it because this is a constant string and it's a part of the API https://docs.openstack.org/api-ref/image/v2/?expanded=import-methods-and-values-discovery-detail#image-service-info-discovery

return true, nil
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

logrus.Debugln("Glance Direct image import plugin was not found")
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

if useImageImport {
logrus.Debugf("Using Image Import API to upload RHCOS to the image %v with ID %v", img.Name, img.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use %q instead of %v, as they're both strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pierreprinetti
Copy link
Member

/approve

@pierreprinetti
Copy link
Member

oh Gophercloud's version bump requires an installer-approver

@sdodson
Copy link
Member

sdodson commented Feb 25, 2020

/approve
/cc @hardys
PTAL at the gophercloud bump, metal3 seems to be the only other consumer of that library.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pierreprinetti, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2020
@pierreprinetti
Copy link
Member

/lgtm

@mandre
I am LGTMing this because as far as my understanding reaches, the patch looks fine.
Please let's circle back to the error handling discussion once you're back!

/label platform/openstack

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. platform/openstack labels Feb 27, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit ec87e64 into openshift:master Feb 27, 2020
@openshift-ci-robot
Copy link
Contributor

@Fedosin: All pull requests linked via external trackers have merged. Bugzilla bug 1806143 has been moved to the MODIFIED state.

In response to this:

Bug 1806143: OpenStack: start using image import when possible

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@Fedosin: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-libvirt 0d92968 link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Fedosin
Copy link
Contributor Author

Fedosin commented Mar 16, 2020

/cherry-pick release-4.4

@openshift-cherrypick-robot

@Fedosin: #3162 failed to apply on top of branch "release-4.4":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	data/data/openstack/main.tf
M	data/data/openstack/variables-openstack.tf
Falling back to patching base and 3-way merge...
Auto-merging data/data/openstack/variables-openstack.tf
Auto-merging data/data/openstack/main.tf
CONFLICT (content): Merge conflict in data/data/openstack/main.tf
Patch failed at 0001 Upload RHCOS images directly with Gophercloud

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. platform/openstack size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants