-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDCC/8605-add-archival-status-support #8696
GDCC/8605-add-archival-status-support #8696
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't run the code yet but here's some initial feedback.
} | ||
Dataset ds = findDatasetOrDie(dsid); | ||
|
||
DatasetVersion dv = datasetversionService.findByFriendlyVersionNumber(ds.getId(), versionNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use getDatasetVersionOrDie here (and in the other two calls to findByFriendlyVersionNumber in this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I saw it but looking now, getDatasetVersionOrDie doesn't support the friendlyVersionNumber syntax which is a ~requirement here (that's the convention used in the Bag naming and metadata that the archiver gets). I can go ahead and add parsing for that which would have the presumably useful side effect of letting other datasetversion api calls support the friendly version number as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. I'm seeing handleSpecific(long major, long minor)
. It's used by https://guides.dataverse.org/en/5.11/api/native-api.html#get-version-of-a-dataset which has a "friendly" example of "1.0".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - you're right. I missed the string parsing in handleVersion(). I'll update the PR to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - calls to this are counted with MakeDataCounts. I guess since these are API calls they should count? (although they are clearly system-level interactions and not end-user interaction with the data). In any case, I went ahead for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno. I'd leave this out of Make Data Count. Like you said, these are systems setting and retrieving archival status messages. The spirit of Make Data Count is views/investigations and downloads/requests. People and machines looking at data.
|
||
@GET | ||
@Produces(MediaType.APPLICATION_JSON) | ||
@Path("/submitDatasetVersionToArchive/{id}/{version}/status") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitDatasetVersionToArchive is a weird name. submitDataVersionToArchive (Data instead of Dataset) is under /api/admin and documented under installation/config.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So far it ~mirrors the /api/admin/submitDatasetVersionToArchive call (name changed to say 'Dataset' in #8610 which hasn't merged yet), which seemed reasonable when it was a single call. With the status calls, I initially had them in /api/admin as well, but eventually decided they should move to /api/datasets (see the comment about superuser being required on those). With that, they could be renamed - e.g. to /api/datasets/<id>/<version>/archivalStatus .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new name ending with /archivalStatus
. Thanks.
@@ -0,0 +1,2 @@ | |||
UPDATE datasetversion SET archivalCopyLocation = CONCAT('{"status":"success", "message":"', archivalCopyLocation,'"}') where archivalCopyLocation is not null and not archivalCopyLocation='Attempted'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this script is only needed by TDL as suggested in the PR description, perhaps we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE datasetversion SET archivalCopyLocation = CONCAT('{"status":"success", "message":"', archivalCopyLocation,'"}') where archivalCopyLocation is not null
is needed for standard instances (those that have used archiving and therefore have non-null entries). The and not archivalCopyLocation='Attempted';
and the second line handle the case that TDL deployed which was in the initial PR #8610 which has gotten passed by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess my understanding is that both lines are needed or at least won't hurt anything.
GDCC/8605-add-archival-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I played around with the tests in DatasetsIT. I didn't test the SQL upgrade script.
What this PR does / why we need it: To support more sophisticated Archivers (i.e. those that can provide status feedback and may have multistep internal processes), this PR adds support for managing this status. Specifically it changes the archivalCopyLocation from being a null/String (originally intended as a URL identifying/providing a landing page for the archival copy in the archiving system) to being a json object that contains a 'status' of 'success'/'pending'/'failure' and a 'message' that is again a string. In the success case, the message is again intended as an identifier/landing page URL whereas for failure and pending, the message can be an informative string.
As noted in the issue, this work is supported as part of the Harvard Data Commons project (3A) for use specifically with the DRS Archiver. However, the PR includes updates to the other existing archivers to use the same format (although these currently only have success and failure status, no pending states.)
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this: The new API supports get/set/delete of the status values. The simplest test would be to configure an archiver, such as the Local file archiver and use the API to retrieve the status and verify the success message. (I think misconfiguration of that, e.g. pointing to a directory where the archiver can't write, should allow viewing a failure status as well.
Also note that another PR will be coming that will show the archival status in the versions table - more opportunity to test the api with that.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No - this is db/api only
Is there a release notes update needed for this change?: part of #8611
Additional documentation: