Skip to content

Commit

Permalink
Merge pull request #10201 from GlobalDataverseCommunityConsortium/glo…
Browse files Browse the repository at this point in the history
…busstore

IQSS/10200 - Create dirs needed for Globus transfer to managed stores
  • Loading branch information
pdurbin committed Jan 11, 2024
2 parents b52fede + 8312001 commit 39794d5
Show file tree
Hide file tree
Showing 3 changed files with 106 additions and 26 deletions.
16 changes: 10 additions & 6 deletions doc/sphinx-guides/source/developers/globus-api.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
Globus Transfer API
===================

.. contents:: |toctitle|
:local:

The Globus API addresses three use cases:

* Transfer to a Dataverse-managed Globus endpoint (File-based or using the Globus S3 Connector)
* Reference of files that will remain in a remote Globus endpoint
* Transfer from a Dataverse-managed Globus endpoint
Expand Down Expand Up @@ -68,7 +72,7 @@ The response includes the id for the Globus endpoint to use along with several s

The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific.

If called for a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
If called for, a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the
requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are
described further below.
Expand All @@ -87,7 +91,7 @@ The returned response includes the same getDatasetMetadata and getFileListing UR
Performing an Upload/Transfer In
--------------------------------

The information from the API call above can be used to provide a user with information about the dataset and to prepare to transfer or to reference files (based on the "managed" parameter).
The information from the API call above can be used to provide a user with information about the dataset and to prepare to transfer (managed=true) or to reference files (managed=false).

Once the user identifies which files are to be added, the requestGlobusTransferPaths or requestGlobusReferencePaths URLs can be called. These both reference the same API call but must be used with different entries in the JSON body sent:

Expand All @@ -98,7 +102,7 @@ Once the user identifies which files are to be added, the requestGlobusTransferP
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export LOCALE=en-US
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUpload"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths"
Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed.

Expand Down Expand Up @@ -153,7 +157,7 @@ In the remote/reference case, the map is from the initially supplied endpoint/pa
Adding Files to the Dataset
---------------------------

In the managed case, once a Globus transfer has been initiated a final API call is made to Dataverse to provide it with the task identifier of the transfer and information about the files being transferred:
In the managed case, you must initiate a Globus transfer and take note of its task identifier. As in the JSON example below, you will pass it as ``taskIdentifier`` along with details about the files you are transferring:

.. code-block:: bash
Expand All @@ -164,9 +168,9 @@ In the managed case, once a Globus transfer has been initiated a final API call
"files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \
{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}'
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles -F "jsonData=$JSON_DATA"
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles" -F "jsonData=$JSON_DATA"
Note that the mimetype is multipart/form-data, matching the /addFiles API call. ALso note that the API_TOKEN is not needed when using a signed URL.
Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL.

With this information, Dataverse will begin to monitor the transfer and when it completes, will add all files for which the transfer succeeded.
As the transfer can take significant time and the API call is asynchronous, the only way to determine if the transfer succeeded via API is to use the standard calls to check the dataset lock state and contents.
Expand Down
108 changes: 88 additions & 20 deletions src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ private String getRuleId(GlobusEndpoint endpoint, String principal, String permi
* @param globusLogger - a separate logger instance, may be null
*/
public void deletePermission(String ruleId, Dataset dataset, Logger globusLogger) {
globusLogger.info("Start deleting rule " + ruleId + " for dataset " + dataset.getId());
globusLogger.fine("Start deleting rule " + ruleId + " for dataset " + dataset.getId());
if (ruleId.length() > 0) {
if (dataset != null) {
GlobusEndpoint endpoint = getGlobusEndpoint(dataset);
Expand Down Expand Up @@ -179,25 +179,95 @@ public JsonObject requestAccessiblePaths(String principal, Dataset dataset, int
permissions.setPrincipal(principal);
permissions.setPath(endpoint.getBasePath() + "/");
permissions.setPermissions("rw");

JsonObjectBuilder response = Json.createObjectBuilder();
response.add("status", requestPermission(endpoint, dataset, permissions));
String driverId = dataset.getEffectiveStorageDriverId();
JsonObjectBuilder paths = Json.createObjectBuilder();
for (int i = 0; i < numberOfPaths; i++) {
String storageIdentifier = DataAccess.getNewStorageIdentifier(driverId);
int lastIndex = Math.max(storageIdentifier.lastIndexOf("/"), storageIdentifier.lastIndexOf(":"));
paths.add(storageIdentifier, endpoint.getBasePath() + "/" + storageIdentifier.substring(lastIndex + 1));
//Try to create the directory (202 status) if it does not exist (502-already exists)
int mkDirStatus = makeDirs(endpoint, dataset);
if (!(mkDirStatus== 202 || mkDirStatus == 502)) {
return response.add("status", mkDirStatus).build();
}
//The dir for the dataset's data exists, so try to request permission for the principal
int requestPermStatus = requestPermission(endpoint, dataset, permissions);
response.add("status", requestPermStatus);
if (requestPermStatus == 201) {
String driverId = dataset.getEffectiveStorageDriverId();
JsonObjectBuilder paths = Json.createObjectBuilder();
for (int i = 0; i < numberOfPaths; i++) {
String storageIdentifier = DataAccess.getNewStorageIdentifier(driverId);
int lastIndex = Math.max(storageIdentifier.lastIndexOf("/"), storageIdentifier.lastIndexOf(":"));
paths.add(storageIdentifier, endpoint.getBasePath() + "/" + storageIdentifier.substring(lastIndex + 1));

}
response.add("paths", paths.build());
}
response.add("paths", paths.build());
return response.build();
}

/**
* Call to create the directories for the specified dataset.
*
* @param dataset
* @return - an error status at whichever subdir the process fails at or the
* final success status
*/
private int makeDirs(GlobusEndpoint endpoint, Dataset dataset) {
logger.fine("Creating dirs: " + endpoint.getBasePath());
int index = endpoint.getBasePath().lastIndexOf(dataset.getAuthorityForFileStorage())
+ dataset.getAuthorityForFileStorage().length();
String nextDir = endpoint.getBasePath().substring(0, index);
int response = makeDir(endpoint, nextDir);
String identifier = dataset.getIdentifierForFileStorage();
//Usually identifiers will have 0 or 1 slashes (e.g. FK2/ABCDEF) but the while loop will handle any that could have more
//Will skip if the first makeDir above failed
while ((identifier.length() > 0) && ((response == 202 || response == 502))) {
index = identifier.indexOf('/');
if (index == -1) {
//Last dir to create
response = makeDir(endpoint, nextDir + "/" + identifier);
identifier = "";
} else {
//The next dir to create
nextDir = nextDir + "/" + identifier.substring(0, index);
response = makeDir(endpoint, nextDir);
//The rest of the identifier
identifier = identifier.substring(index + 1);
}
}
return response;
}

private int makeDir(GlobusEndpoint endpoint, String dir) {
MakeRequestResponse result = null;
String body = "{\"DATA_TYPE\":\"mkdir\",\"path\":\"" + dir + "\"}";
try {
logger.fine(body);
URL url = new URL(
"https://transfer.api.globusonline.org/v0.10/operation/endpoint/" + endpoint.getId() + "/mkdir");
result = makeRequest(url, "Bearer", endpoint.getClientToken(), "POST", body);

switch (result.status) {
case 202:
logger.fine("Dir " + dir + " was created successfully.");
break;
case 502:
logger.fine("Dir " + dir + " already exists.");
break;
default:
logger.warning("Status " + result.status + " received when creating dir " + dir);
logger.fine("Response: " + result.jsonResponse);
}
} catch (MalformedURLException ex) {
// Misconfiguration
logger.warning("Failed to create dir on " + endpoint.getId());
return 500;
}
return result.status;
}

private int requestPermission(GlobusEndpoint endpoint, Dataset dataset, Permissions permissions) {
Gson gson = new GsonBuilder().create();
MakeRequestResponse result = null;
logger.info("Start creating the rule");
logger.fine("Start creating the rule");

try {
URL url = new URL("https://transfer.api.globusonline.org/v0.10/endpoint/" + endpoint.getId() + "/access");
Expand All @@ -218,7 +288,7 @@ private int requestPermission(GlobusEndpoint endpoint, Dataset dataset, Permissi
if (globusResponse != null && globusResponse.containsKey("access_id")) {
permissions.setId(globusResponse.getString("access_id"));
monitorTemporaryPermissions(permissions.getId(), dataset.getId());
logger.info("Access rule " + permissions.getId() + " was created successfully");
logger.fine("Access rule " + permissions.getId() + " was created successfully");
} else {
// Shouldn't happen!
logger.warning("Access rule id not returned for dataset " + dataset.getId());
Expand Down Expand Up @@ -363,7 +433,6 @@ private static MakeRequestResponse makeRequest(URL url, String authType, String
try {
connection = (HttpURLConnection) url.openConnection();
// Basic
logger.info(authType + " " + authCode);
logger.fine("For URL: " + url.toString());
connection.setRequestProperty("Authorization", authType + " " + authCode);
// connection.setRequestProperty("Content-Type",
Expand Down Expand Up @@ -713,7 +782,7 @@ public void globusUpload(JsonObject jsonData, ApiToken token, Dataset dataset, S
.mapToObj(index -> ((JsonObject) newfilesJsonArray.get(index)).getJsonObject(fileId))
.filter(Objects::nonNull).collect(Collectors.toList());
if (newfileJsonObject != null) {
logger.info("List Size: " + newfileJsonObject.size());
logger.fine("List Size: " + newfileJsonObject.size());
// if (!newfileJsonObject.get(0).getString("hash").equalsIgnoreCase("null")) {
JsonPatch path = Json.createPatchBuilder()
.add("/md5Hash", newfileJsonObject.get(0).getString("hash")).build();
Expand Down Expand Up @@ -884,18 +953,18 @@ public void globusDownload(String jsonData, Dataset dataset, User authUser) thro
String taskIdentifier = jsonObject.getString("taskIdentifier");

GlobusEndpoint endpoint = getGlobusEndpoint(dataset);
logger.info("Endpoint path: " + endpoint.getBasePath());
logger.fine("Endpoint path: " + endpoint.getBasePath());

// If the rules_cache times out, the permission will be deleted. Presumably that
// doesn't affect a
// globus task status check
GlobusTask task = getTask(endpoint.getClientToken(), taskIdentifier, globusLogger);
String ruleId = getRuleId(endpoint, task.getOwner_id(), "r");
if (ruleId != null) {
logger.info("Found rule: " + ruleId);
logger.fine("Found rule: " + ruleId);
Long datasetId = rulesCache.getIfPresent(ruleId);
if (datasetId != null) {
logger.info("Deleting from cache: rule: " + ruleId);
logger.fine("Deleting from cache: rule: " + ruleId);
// Will not delete rule
rulesCache.invalidate(ruleId);
}
Expand All @@ -909,7 +978,7 @@ public void globusDownload(String jsonData, Dataset dataset, User authUser) thro

// Transfer is done (success or failure) so delete the rule
if (ruleId != null) {
logger.info("Deleting: rule: " + ruleId);
logger.fine("Deleting: rule: " + ruleId);
deletePermission(ruleId, dataset, globusLogger);
}

Expand Down Expand Up @@ -1032,7 +1101,6 @@ public JsonObject calculateMissingMetadataFields(List<String> inputList, Logger
}

private CompletableFuture<FileDetailsHolder> calculateDetailsAsync(String id, Logger globusLogger) {
// logger.info(" calcualte additional details for these globus id ==== " + id);

return CompletableFuture.supplyAsync(() -> {
try {
Expand Down Expand Up @@ -1071,7 +1139,7 @@ private FileDetailsHolder calculateDetails(String id, Logger globusLogger)
count = 3;
} catch (IOException ioex) {
count = 3;
logger.info(ioex.getMessage());
logger.fine(ioex.getMessage());
globusLogger.info(
"DataFile (fullPath " + fullPath + ") does not appear to be accessible within Dataverse: ");
} catch (Exception ex) {
Expand Down
8 changes: 8 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -3725,4 +3725,12 @@ static Response requestGlobusDownload(Integer datasetId, JsonObject body, String
.post("/api/datasets/" + datasetId + "/requestGlobusDownload");
}

static Response requestGlobusUploadPaths(Integer datasetId, JsonObject body, String apiToken) {
return given()
.header(API_TOKEN_HTTP_HEADER, apiToken)
.body(body.toString())
.contentType("application/json")
.post("/api/datasets/" + datasetId + "/requestGlobusUploadPaths");
}

}

0 comments on commit 39794d5

Please sign in to comment.