Skip to content
This repository has been archived by the owner on Sep 12, 2022. It is now read-only.

iRODS transfer support for application_to_provider #318

Conversation

c-mart
Copy link
Contributor

@c-mart c-mart commented Apr 6, 2017

This PR adds support for image data transfer via iRODS data object copy, when both the source and destination providers both use the iRODS storage back-end for Glance, and both providers share a common iRODS zone. Very specific to CyVerse atmosphere(0) and perhaps not likely that anyone else will use it! The benefit for us is that migrating images becomes very fast, on the order of a minute rather than 10-30+ minutes.

Using iRODS to transfer image data requires us to reach "around" the Glance API, populating the image data in iRODS out-of-band, and then adding the location to the image object in the Glance API. Unfortunately, using Glance in this way has some caveats:

  • Glance does not allow adding a location to an image unless we enable a feature (show_multiple_locations) which presents a security issue for us: it would expose a connection string to users which contains credentials for the iRODS service account used by Glance server. For now, commenting out three lines of code fixes this limitation. Such a patch will no longer be necessary in OpenStack Pike release, when show_multiple_locations is deprecated and instead we can use policy.json to control who can set/get image locations.
  • The image checksum is not populated in the Glance database when the Glance API is not used to upload image data, so we end up with Glance images that have a checksum of None.
    • This means that when we transfer image data using iRODS, application_to_provider.py can no longer use checksums to determine if image data needs to be transferred. It needs to use the image size, which is not a strong assurance that the bits match.
    • This could be changed in glance, see my comment on bug 1551498.

Checklist before merging

  • test final changeset against a fresh development environment, ensure that images migrated using iRODS are launchable
  • resolve python-irodsclient dependency in rtwo
  • ideally get irods/python-irodsclient #67 merged so we don't need to use my fork (not waiting for this to happen)
  • Documentation created/updated at Example link to documentation to give context to the feature (script already referenced here and iRODS support options are documented in script help)
  • Reviewed and approved by at least one other contributor.
  • If necessary, include a snippet in CHANGELOG.md

Checklist after merging

  • For a maintainer: backport change to whymsical-wyvern

@c-mart c-mart force-pushed the application-to-provider-irods-xfer branch from a5d5e5b to df6bdfe Compare April 6, 2017 20:42
@c-mart
Copy link
Contributor Author

c-mart commented Apr 8, 2017

Tested this in a fresh local development environment. In doing so, ran into several other issues which ended up taking most of the day. Still, every application I migrated using the iRODS transfer completed successfully and quickly. The longest one (an AMI-based image with kernel and ramdisk image dependencies) took 2 minutes, the others all under a minute. The script populated what was expected, both in iRODS, Glance, and in the Atmosphere database. Instances do launch, at this point I can't SSH to them but I'm fairly sure that's an unrelated problem.

Removing WIP label, ready to review! If we are OK with the above caveats, I would like to get this backported to WW and hotfixed to production so I can start bulk-migrating images to the Marana cloud.

@c-mart c-mart changed the title WIP: iRODS transfer support for application_to_provider iRODS transfer support for application_to_provider Apr 8, 2017
Copy link
Contributor

@steve-gregory steve-gregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, could use a little cleanup in the main()..

I've gone ahead and merged the corresponding rtwo PR. We will need to update the requirements.txt when the irodsclient PR is merged in and available.

@@ -255,15 +293,42 @@ def main():

local_storage_dir = secrets.LOCAL_STORAGE if os.path.exists(secrets.LOCAL_STORAGE) else "/tmp"
local_path = os.path.join(local_storage_dir, sprov_img_uuid)

def migrate_image_data(img_uuid):
# Todo this function is in an awkward place and relies on 'global' state, unsure of best way to refactor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes main() pretty turse -- Let's move this function out and isolate the arguments you require.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steve-gregory did this in commit 9440eaa.

OpenStack Glance (https://github.com/cyverse/glance-irods)
- Src. and dst. providers must store images in the same iRODS zone
- --source-provider-id, --irods-conn, --irods-src-coll, and --irods-dst-coll
must all be defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would seem that only one of the two collections would be required if someone wanted to go iRODS -> Glance or Glance -> iRODS?

Just curious how far we are away from Many-to-many relationship 😄

Copy link
Contributor Author

@c-mart c-mart Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steve-gregory, I don't know what you mean. Downloading an image from provider A using iRODS and then uploading it to provider B using Glance, and vice versa? I haven't built support for that. Regardless of which storage back-end is in use, why would someone want to do that over Glance's preferred way of using the API for the entire migration? The only reason to use iRODS transfer is for the speedup of iRODS data object copy, and for that you wouldn't want to be uploading or downloading image data via Glance API.

I don't know what you mean by "many-to-many relationship" in this context. Can you please clarify?

@steve-gregory steve-gregory changed the base branch from master to xylotomous-xenops April 10, 2017 16:38
@c-mart c-mart force-pushed the application-to-provider-irods-xfer branch from 0e7079a to 73ae24d Compare April 10, 2017 18:25
@c-mart
Copy link
Contributor Author

c-mart commented Apr 11, 2017

Made script more functional and a few other improvements. Tested to ensure it still migrates images both using iRODS transfer and pure Glance API transfer. Feel free to merge if there are no other requested changes applicable to this PR.

@steve-gregory steve-gregory merged commit cc512e3 into cyverse:xylotomous-xenops Apr 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants