Skip to content
This repository has been archived by the owner on Sep 12, 2022. It is now read-only.

iRODS transfer support for application_to_provider #318

Merged
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
## [Whimsical-Wyvern](https://github.com/cyverse/atmosphere/milestone/10?closed=1) (as of 3/21/2017)
## [Whimsical-Wyvern](https://github.com/cyverse/atmosphere/milestone/10?closed=1) (as of 4/6/2017)

Features:
- Include sentry.io error reporting for production environments
- [application_to_provider](https://github.com/cyverse/atmosphere/pull/284) migration script
- [iRODS transfer support](https://github.com/cyverse/atmosphere/pull/318) for application_to_provider script

Improvements:
- Improved support for Instance Actions in v2 APIs
Expand Down
1 change: 1 addition & 0 deletions requirements.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
-e git+https://github.com/steve-gregory/billiard.git#egg=billiard # TEMPORARY
-e git+https://github.com/c-mart/python-irodsclient.git@data-object-copy#egg=python-irodsclient # Temporary until https://github.com/irods/python-irodsclient/pull/67 is merged

Django==1.10.6
django-cors-headers==0.12.0
Expand Down
5 changes: 4 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# pip-compile --output-file requirements.txt requirements.in
#
-e git+https://github.com/steve-gregory/billiard.git#egg=billiard
-e git+https://github.com/c-mart/python-irodsclient.git@data-object-copy#egg=python-irodsclient

amqp==2.1.4 # via kombu
ansible==2.2.1.0 # via subspace
apache-libcloud==0.20.1
Expand Down Expand Up @@ -90,7 +92,8 @@ pyparsing==2.2.0 # via cliff, cmd2, oslo.utils, packaging
python-cinderclient==1.9.0 # via python-openstackclient, rtwo
python-dateutil==2.6.0
python-glanceclient==2.5.0 # via python-openstackclient, rtwo
python-irodsclient==0.4.0 # via rtwo
# python-irodsclient commented out until https://github.com/irods/python-irodsclient/pull/67 and https://github.com/cyverse/rtwo/pull/10 are merged
# python-irodsclient==0.4.0 # via rtwo
python-keystoneclient==3.6.0 # via django-cyverse-auth, python-glanceclient, python-openstackclient, rtwo
python-ldap==2.4.19
python-logstash==0.4.5
Expand Down
135 changes: 114 additions & 21 deletions scripts/application_to_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
import logging
import os
import sys
import urlparse

import OpenSSL.SSL

from irods.session import iRODSSession
import glanceclient.exc
import django; django.setup()
import core.models
Expand All @@ -21,7 +23,9 @@

- Creates Glance image
- Populates Glance image metadata
- Transfers image data from existing provider using Glance API
- Transfers image data from existing provider
- Using Glance API (default)
- Optionally, using iRODS (Atmosphere(0)-specific feature)
- If Application uses an AMI-style image, ensures the
kernel (AKI) and ramdisk (ARI) images are also present on destination
provider, and sets appropriate properties
Expand All @@ -36,6 +40,26 @@
If a non-public application has or more members without identities on the
destination provider, script will exit with error unless
--ignore_missing_members is set.

The iRODS transfer feature was developed for CyVerse Atmosphere(0); may be of
limited use elsewhere. In order to use it:
- Source and destination providers must use the iRODS storage backend for
OpenStack Glance (https://github.com/cyverse/glance-irods)
- Src. and dst. providers must store images in the same iRODS zone
- --source-provider-id, --irods-conn, --irods-src-coll, and --irods-dst-coll
must all be defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would seem that only one of the two collections would be required if someone wanted to go iRODS -> Glance or Glance -> iRODS?

Just curious how far we are away from Many-to-many relationship 😄

Copy link
Contributor Author

@c-mart c-mart Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steve-gregory, I don't know what you mean. Downloading an image from provider A using iRODS and then uploading it to provider B using Glance, and vice versa? I haven't built support for that. Regardless of which storage back-end is in use, why would someone want to do that over Glance's preferred way of using the API for the entire migration? The only reason to use iRODS transfer is for the speedup of iRODS data object copy, and for that you wouldn't want to be uploading or downloading image data via Glance API.

I don't know what you mean by "many-to-many relationship" in this context. Can you please clarify?

- Credentials passed in --irods-conn must have write access to both source and
destination collections

Considerations when using iRODS transfer:
- The credentials passed in --irods-conn will be used to populate the image
location in the Glance database on the destination provider. Consider passing
the iRODS credentials already in use for the Glance iRODS back-end on that
provider, and making the source collection readable to same.
- This script does not set data object permissions in iRODS. This means that
for the destination provider, the iRODS account used by Glance server should
have write (or own) access to the destination collection (where new data
objects are created), and *inheritance should be enabled*.
"""

max_tries = 3 # Maximum number of times to attempt downloading and uploading image data
Expand All @@ -44,8 +68,22 @@
def main():
args = _parse_args()
logging.info("Running application_to_provider with the following arguments:\n{0}".format(str(args)))
if args.irods_xfer:
raise NotImplementedError("iRODS transfer not built yet")

irods_args = (args.irods_conn, args.irods_src_coll, args.irods_dst_coll)
if any(irods_args):
irods = True
if all(irods_args) and args.source_provider_id:
irods_conn = _parse_irods_conn(args.irods_conn)
irods_src_coll = args.irods_src_coll
irods_dst_coll = args.irods_dst_coll
else:
raise Exception("If using iRODS transfer then --source-provider-id, --irods-conn, --irods-src-coll, and "
"--irods-dst-coll must all be defined")
else:
irods = False

persist_local_cache = True if args.persist_local_cache else False

if args.source_provider_id == args.destination_provider_id:
raise Exception("Source provider cannot be the same as destination provider")
app = core.models.Application.objects.get(id=args.application_id)
Expand Down Expand Up @@ -255,15 +293,42 @@ def main():

local_storage_dir = secrets.LOCAL_STORAGE if os.path.exists(secrets.LOCAL_STORAGE) else "/tmp"
local_path = os.path.join(local_storage_dir, sprov_img_uuid)

def migrate_image_data(img_uuid):
# Todo this function is in an awkward place and relies on 'global' state, unsure of best way to refactor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes main() pretty turse -- Let's move this function out and isolate the arguments you require.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steve-gregory did this in commit 9440eaa.

"""
Ensures that Glance image data matches between a source and a destination OpenStack provider.
Migrates image data if needed, using either Glance API download/upload or iRODS data object copy.
Args:
img_uuid: UUID of image to be migrated

Returns: True if successful, else raises exception
"""

src_img = sprov_glance_client.images.get(img_uuid)
dst_img = dprov_glance_client.images.get(img_uuid)
if irods:
# Unable to use checksum for irods transfer, because checksum is not set in Glance when a location
# is added to an image (instead of uploading image data via Glance API) :(
if src_img.size == dst_img.size:
logging.info("Image data size matches on source and destination providers, not migrating data")
return True
else:
migrate_image_data_irods(dprov_glance_client, irods_conn, irods_src_coll, irods_dst_coll, img_uuid)
else:
if src_img.checksum == dst_img.checksum:
logging.info("Image data checksum matches on source and destination providers, not migrating data")
return True
else:
migrate_image_data_glance(sprov_glance_client, dprov_glance_client, img_uuid, local_path,
persist_local_cache)

# Populate image data in destination provider if needed
migrate_image_data(sprov_glance_client, dprov_glance_client, sprov_img_uuid, local_path,
persist_local_cache=args.persist_local_cache)
migrate_image_data(sprov_img_uuid)
# If AMI-based image, populate image data in destination provider if needed
if ami:
migrate_image_data(sprov_glance_client, dprov_glance_client, sprov_aki_glance_image.id, local_path,
persist_local_cache=args.persist_local_cache)
migrate_image_data(sprov_glance_client, dprov_glance_client, sprov_ari_glance_image.id, local_path,
persist_local_cache=args.persist_local_cache)
migrate_image_data(sprov_aki_glance_image.id)
migrate_image_data(sprov_ari_glance_image.id)


def file_md5(path):
Expand Down Expand Up @@ -292,10 +357,9 @@ def get_or_create_glance_image(glance_client, img_uuid):
return glance_image


def migrate_image_data(src_glance_client, dst_glance_client, img_uuid, local_path, persist_local_cache=True, max_tries=3):
def migrate_image_data_glance(src_glance_client, dst_glance_client, img_uuid, local_path, persist_local_cache=True, max_tries=3):
"""
Ensures that Glance image data matches between a source and a destination OpenStack provider. Migrates image data
if needed. Assumes that:
Migrates image data using Glance API. Assumes that:
- The Glance image object has already been created in the source provider
- The Glance image UUIDs match between providers

Expand All @@ -311,11 +375,6 @@ def migrate_image_data(src_glance_client, dst_glance_client, img_uuid, local_pat
Returns: True if success, else raises an exception
"""
src_img = src_glance_client.images.get(img_uuid)
dst_img = dst_glance_client.images.get(img_uuid)
if src_img.checksum == dst_img.checksum:
logging.info("Image data checksum matches on source and destination providers, not migrating data")
return True
logging.info("Migrating image data because checksums don't match between source and destination providers")

# Download image from source provider, only if there is no correct local copy
if os.path.exists(local_path) and file_md5(local_path) == src_img.checksum:
Expand Down Expand Up @@ -362,6 +421,35 @@ def migrate_image_data(src_glance_client, dst_glance_client, img_uuid, local_pat
return True


def migrate_image_data_irods(dst_glance_client, irods_conn, irods_src_coll, irods_dst_coll, img_uuid):
sess = iRODSSession(host=irods_conn.get('host'),
port=irods_conn.get('port'),
zone=irods_conn.get('zone'),
user=irods_conn.get('username'),
password=irods_conn.get('password'))
src_data_obj_path = os.path.join(irods_src_coll, img_uuid)
dst_data_obj_path = os.path.join(irods_dst_coll, img_uuid)
print(src_data_obj_path, dst_data_obj_path)
sess.data_objects.copy(src_data_obj_path, dst_data_obj_path)
logging.info("Copied image data to destination collection in iRODS")
dst_img_location = "irods://{0}:{1}@{2}:{3}{4}".format(
irods_conn.get('username'),
irods_conn.get('password'),
irods_conn.get('host'),
irods_conn.get('port'),
dst_data_obj_path
)
# Assumption that iRODS copy will always be correct+complete, not inspecting checksums afterward?
dst_glance_client.images.add_location(img_uuid, dst_img_location, dict())
logging.info("Set image location in Glance")


def _parse_irods_conn(irods_conn_str):
u = urlparse.urlparse(irods_conn_str)
irods_conn = {"username": u.username, "password": u.password, "host": u.hostname, "port": u.port, "zone": u.path[1:]}
return irods_conn


def _parse_args():
parser = argparse.ArgumentParser(description=description, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("application_id", type=int, help="Application ID to be migrated")
Expand All @@ -370,10 +458,6 @@ def _parse_args():
type=int,
help="Migrate image from source provider with this ID (else a source provider will be chosen "
"automatically")
parser.add_argument("--irods-xfer",
action="store_true",
help="Transfer image data using iRODS instead of glance download/upload "
"(Atmosphere(0)-specific feature), not yet implemented")
parser.add_argument("--ignore-missing-owner",
action="store_true",
help="Transfer image if application owner has no identity on destination provider (owner will "
Expand All @@ -387,6 +471,15 @@ def _parse_args():
help="If image download succeeds but upload fails, keep local cached copy for subsequent "
"attempt. (Local cache is always deleted after successful upload). "
"May consume a lot of disk space.")
parser.add_argument("--irods-conn",
type=str,
help="iRODS connection string in the form of irods://user:password@host:port/zone")
parser.add_argument("--irods-src-coll",
type=str,
help="Collection for iRODS images on source provider")
parser.add_argument("--irods-dst-coll",
type=str,
help="Collection for iRODS images on destination provider")
args = parser.parse_args()
return args

Expand Down