-
Notifications
You must be signed in to change notification settings - Fork 96
iRODS transfer support for application_to_provider #318
iRODS transfer support for application_to_provider #318
Conversation
a5d5e5b
to
df6bdfe
Compare
Tested this in a fresh local development environment. In doing so, ran into several other issues which ended up taking most of the day. Still, every application I migrated using the iRODS transfer completed successfully and quickly. The longest one (an AMI-based image with kernel and ramdisk image dependencies) took 2 minutes, the others all under a minute. The script populated what was expected, both in iRODS, Glance, and in the Atmosphere database. Instances do launch, at this point I can't SSH to them but I'm fairly sure that's an unrelated problem. Removing WIP label, ready to review! If we are OK with the above caveats, I would like to get this backported to WW and hotfixed to production so I can start bulk-migrating images to the Marana cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, could use a little cleanup in the main()
..
I've gone ahead and merged the corresponding rtwo
PR. We will need to update the requirements.txt when the irodsclient PR is merged in and available.
scripts/application_to_provider.py
Outdated
@@ -255,15 +293,42 @@ def main(): | |||
|
|||
local_storage_dir = secrets.LOCAL_STORAGE if os.path.exists(secrets.LOCAL_STORAGE) else "/tmp" | |||
local_path = os.path.join(local_storage_dir, sprov_img_uuid) | |||
|
|||
def migrate_image_data(img_uuid): | |||
# Todo this function is in an awkward place and relies on 'global' state, unsure of best way to refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this makes main()
pretty turse -- Let's move this function out and isolate the arguments you require.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steve-gregory did this in commit 9440eaa.
OpenStack Glance (https://github.com/cyverse/glance-irods) | ||
- Src. and dst. providers must store images in the same iRODS zone | ||
- --source-provider-id, --irods-conn, --irods-src-coll, and --irods-dst-coll | ||
must all be defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would seem that only one of the two collections would be required if someone wanted to go iRODS -> Glance
or Glance -> iRODS
?
Just curious how far we are away from Many-to-many relationship 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steve-gregory, I don't know what you mean. Downloading an image from provider A using iRODS and then uploading it to provider B using Glance, and vice versa? I haven't built support for that. Regardless of which storage back-end is in use, why would someone want to do that over Glance's preferred way of using the API for the entire migration? The only reason to use iRODS transfer is for the speedup of iRODS data object copy, and for that you wouldn't want to be uploading or downloading image data via Glance API.
I don't know what you mean by "many-to-many relationship" in this context. Can you please clarify?
0e7079a
to
73ae24d
Compare
Made script more functional and a few other improvements. Tested to ensure it still migrates images both using iRODS transfer and pure Glance API transfer. Feel free to merge if there are no other requested changes applicable to this PR. |
This PR adds support for image data transfer via iRODS data object copy, when both the source and destination providers both use the iRODS storage back-end for Glance, and both providers share a common iRODS zone. Very specific to CyVerse atmosphere(0) and perhaps not likely that anyone else will use it! The benefit for us is that migrating images becomes very fast, on the order of a minute rather than 10-30+ minutes.
Using iRODS to transfer image data requires us to reach "around" the Glance API, populating the image data in iRODS out-of-band, and then adding the location to the image object in the Glance API. Unfortunately, using Glance in this way has some caveats:
show_multiple_locations
) which presents a security issue for us: it would expose a connection string to users which contains credentials for the iRODS service account used by Glance server. For now, commenting out three lines of code fixes this limitation. Such a patch will no longer be necessary in OpenStack Pike release, whenshow_multiple_locations
is deprecated and instead we can usepolicy.json
to control who can set/get image locations.None
.application_to_provider.py
can no longer use checksums to determine if image data needs to be transferred. It needs to use the image size, which is not a strong assurance that the bits match.Checklist before merging
ideally get irods/python-irodsclient #67 merged so we don't need to use my fork(not waiting for this to happen)Documentation created/updated at Example link to documentation to give context to the feature(script already referenced here and iRODS support options are documented in script help)Checklist after merging