GDCC/8605-add-archival-status-support #8696

qqmyers · 2022-05-13T19:58:37Z

What this PR does / why we need it: To support more sophisticated Archivers (i.e. those that can provide status feedback and may have multistep internal processes), this PR adds support for managing this status. Specifically it changes the archivalCopyLocation from being a null/String (originally intended as a URL identifying/providing a landing page for the archival copy in the archiving system) to being a json object that contains a 'status' of 'success'/'pending'/'failure' and a 'message' that is again a string. In the success case, the message is again intended as an identifier/landing page URL whereas for failure and pending, the message can be an informative string.
As noted in the issue, this work is supported as part of the Harvard Data Commons project (3A) for use specifically with the DRS Archiver. However, the PR includes updates to the other existing archivers to use the same format (although these currently only have success and failure status, no pending states.)

Which issue(s) this PR closes:

Closes HDC 3A: support handling archival status updates #8605

Special notes for your reviewer:

Could rename the db column as it is no longer a location.
The API calls follow the original naming convention of the admin API submitDatasetVersionToArchive format which doesn't fit as well with the /api/datasets convention of having the next. These could be changed - would require changes in the DataCommons service that calls them - and presumably we/d want to align the existing admin call and batch call in TDL/7493 Batch Archiving #8610 - let me know a decision.
Also note that the flyway script also handles the 'Attempted' state introduced in GDCC/8604 Improve archiver error handling #8612. Nominally this should only be in development databases and at TDL where this was added to avoid rerunning the archiving for failed datasets when doing batch uploads. That will be superseded/replaced by this PR.
FWIW: This API was added to the /datasets endpoint because the intent is for remote archiving systems (like DRS) to report their status updates and putting it in admin would restrict it to localhost or require changing to the unblock-key policy. The API is limited to superuser use.

Suggestions on how to test this: The new API supports get/set/delete of the status values. The simplest test would be to configure an archiver, such as the Local file archiver and use the API to retrieve the status and verify the success message. (I think misconfiguration of that, e.g. pointing to a directory where the archiver can't write, should allow viewing a failure status as well.
Also note that another PR will be coming that will show the archival status in the versions table - more opportunity to test the api with that.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No - this is db/api only

Is there a release notes update needed for this change?: part of #8611

Additional documentation:

coveralls · 2022-05-13T20:04:47Z

Coverage decreased (-0.03%) to 19.736% when pulling 7410c5b on GlobalDataverseCommunityConsortium:GDCC/8605-add-archival-status into 567e506 on IQSS:develop.

…al-status

pdurbin

I didn't run the code yet but here's some initial feedback.

src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

pdurbin · 2022-07-14T20:22:10Z

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

+            }
+            Dataset ds = findDatasetOrDie(dsid);
+
+            DatasetVersion dv = datasetversionService.findByFriendlyVersionNumber(ds.getId(), versionNumber);


Any reason not to use getDatasetVersionOrDie here (and in the other two calls to findByFriendlyVersionNumber in this PR)?

Not sure I saw it but looking now, getDatasetVersionOrDie doesn't support the friendlyVersionNumber syntax which is a ~requirement here (that's the convention used in the Bag naming and metadata that the archiver gets). I can go ahead and add parsing for that which would have the presumably useful side effect of letting other datasetversion api calls support the friendly version number as well.

It should. I'm seeing handleSpecific(long major, long minor). It's used by https://guides.dataverse.org/en/5.11/api/native-api.html#get-version-of-a-dataset which has a "friendly" example of "1.0".

Yep - you're right. I missed the string parsing in handleVersion(). I'll update the PR to use it.

Hmm - calls to this are counted with MakeDataCounts. I guess since these are API calls they should count? (although they are clearly system-level interactions and not end-user interaction with the data). In any case, I went ahead for now.

I dunno. I'd leave this out of Make Data Count. Like you said, these are systems setting and retrieving archival status messages. The spirit of Make Data Count is views/investigations and downloads/requests. People and machines looking at data.

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

pdurbin · 2022-07-14T20:57:00Z

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

+
+    @GET
+    @Produces(MediaType.APPLICATION_JSON)
+    @Path("/submitDatasetVersionToArchive/{id}/{version}/status")


submitDatasetVersionToArchive is a weird name. submitDataVersionToArchive (Data instead of Dataset) is under /api/admin and documented under installation/config.html

Yes. So far it ~mirrors the /api/admin/submitDatasetVersionToArchive call (name changed to say 'Dataset' in #8610 which hasn't merged yet), which seemed reasonable when it was a single call. With the status calls, I initially had them in /api/admin as well, but eventually decided they should move to /api/datasets (see the comment about superuser being required on those). With that, they could be renamed - e.g. to /api/datasets/<id>/<version>/archivalStatus .

I like the new name ending with /archivalStatus. Thanks.

src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java

pdurbin · 2022-07-14T21:00:20Z

src/main/resources/db/migration/V5.11.0.1__8605-support-archival-status.sql

@@ -0,0 +1,2 @@
+UPDATE datasetversion SET archivalCopyLocation = CONCAT('{"status":"success", "message":"', archivalCopyLocation,'"}') where archivalCopyLocation is not null and not archivalCopyLocation='Attempted';


If this script is only needed by TDL as suggested in the PR description, perhaps we don't need it.

UPDATE datasetversion SET archivalCopyLocation = CONCAT('{"status":"success", "message":"', archivalCopyLocation,'"}') where archivalCopyLocation is not null is needed for standard instances (those that have used archiving and therefore have non-null entries). The and not archivalCopyLocation='Attempted'; and the second line handle the case that TDL deployed which was in the initial PR #8610 which has gotten passed by this PR.

Ok, I guess my understanding is that both lines are needed or at least won't hurt anything.

src/main/java/edu/harvard/iq/dataverse/util/json/JsonUtil.java

…al-status

GDCC/8605-add-archival-status

pdurbin

Looks good. I played around with the tests in DatasetsIT. I didn't test the SQL upgrade script.

qqmyers added 3 commits May 13, 2022 14:51

Archival status success/pending/failure/null support

de62791

flyway to update existing

8c82c61

fix typos/mistakes

b354bc3

qqmyers added 2 commits May 13, 2022 16:27

basic status logging in existing archivers

9c9ac65

API docs

221ca0b

qqmyers marked this pull request as ready for review May 13, 2022 20:53

qqmyers added the HDC: 3a Harvard Data Commons Obj. 3A label May 17, 2022

qqmyers added the HDC Harvard Data Commons label May 24, 2022

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

8902d9a

…al-status

mreekie added the pm.sprint.2022_05_25 label May 25, 2022

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

a37922b

…al-status

qqmyers mentioned this pull request May 26, 2022

HDC 3A: Support single-version-only archiving #8746

Closed

rename flyway

cefa12c

This was referenced May 26, 2022

Gdcc/8746 single version semantics for archiving #8747

Merged

Gdcc/8745 archival status UI #8748

Merged

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

e1c62af

…al-status

sekmiller self-assigned this Jun 6, 2022

sekmiller removed their assignment Jun 24, 2022

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

d2bf93c

…al-status

pdurbin self-assigned this Jul 14, 2022

qqmyers added 2 commits July 14, 2022 12:04

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

ae1c97c

…al-status

update flyway naming

d3a7b04

pdurbin reviewed Jul 14, 2022

View reviewed changes

pdurbin assigned qqmyers Jul 14, 2022

Merge remote-tracking branch 'IQSS/develop' into GDCC/8605-add-archiv…

5295bcd

…al-status

qqmyers mentioned this pull request Jul 15, 2022

TDL/7493 Batch Archiving #8610

Merged

updates per review

9223e7d

qqmyers added 3 commits July 15, 2022 18:38

swap native update

f5396d8

Merge remote-tracking branch 'IQSS/develop' into

986f9ff

GDCC/8605-add-archival-status

missed logger.fine

8750e62

qqmyers removed their assignment Jul 18, 2022

qqmyers and others added 7 commits July 19, 2022 14:26

test tweak

5d617f0

fix jsonpath

8fcb59c

fix URLs

d2d817e

add content type on set

6a70d42

application/json

e498417

in docs, show verbs for clarity, s/Json/JSON/ IQSS#8605

8a99685

lower logging IQSS#8605

7362e1c

pdurbin approved these changes Jul 19, 2022

View reviewed changes

pdurbin removed their assignment Jul 19, 2022

format urls in docs

7410c5b

kcondon merged commit fed27f9 into IQSS:develop Jul 21, 2022

kcondon self-assigned this Jul 25, 2022

pdurbin added this to the 5.12 milestone Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDCC/8605-add-archival-status-support #8696

GDCC/8605-add-archival-status-support #8696

qqmyers commented May 13, 2022 •

edited by pdurbin

Loading

coveralls commented May 13, 2022 •

edited

Loading

pdurbin left a comment

pdurbin Jul 14, 2022

qqmyers Jul 15, 2022

pdurbin Jul 15, 2022

qqmyers Jul 15, 2022

qqmyers Jul 15, 2022

pdurbin Jul 19, 2022

pdurbin Jul 14, 2022

qqmyers Jul 15, 2022

pdurbin Jul 19, 2022

pdurbin Jul 14, 2022

qqmyers Jul 15, 2022

pdurbin Jul 19, 2022

pdurbin left a comment

		@@ -0,0 +1,2 @@
		UPDATE datasetversion SET archivalCopyLocation = CONCAT('{"status":"success", "message":"', archivalCopyLocation,'"}') where archivalCopyLocation is not null and not archivalCopyLocation='Attempted';

GDCC/8605-add-archival-status-support #8696

GDCC/8605-add-archival-status-support #8696

Conversation

qqmyers commented May 13, 2022 • edited by pdurbin Loading

coveralls commented May 13, 2022 • edited Loading

pdurbin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdurbin left a comment

Choose a reason for hiding this comment

qqmyers commented May 13, 2022 •

edited by pdurbin

Loading

coveralls commented May 13, 2022 •

edited

Loading