Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDCC/8749 S3 Archiver #8751

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented May 26, 2022

What this PR does / why we need it: Implements an S3 Archiver consistent with recent updates to add archival status support, microprifile config of s3 connections, etc.

Which issue(s) this PR closes:

Closes #8749

Special notes for your reviewer: Builds on #8748 - primarily an S3 archiver class and docs with some minor refactoring to pull a few lines of common code into the abstract base class.

Suggestions on how to test this: Follow the configuration instructions to setup an S3 archiver and test it the same way as a local archiver. The same bag/datacite xml files should be sent to the bucket as would have been sent to the local file system.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: no

Is there a release notes update needed for this change?: addressed in #8611

Additional documentation:

@qqmyers qqmyers added HDC Harvard Data Commons HDC: 3a Harvard Data Commons Obj. 3A labels May 26, 2022
@coveralls
Copy link

coveralls commented May 26, 2022

Coverage Status

Coverage decreased (-0.03%) to 19.688% when pulling 7d239ba on GlobalDataverseCommunityConsortium:GDCC/8749-S3Archiver into ac49488 on IQSS:develop.

@qqmyers qqmyers mentioned this pull request May 26, 2022
@sekmiller sekmiller self-assigned this Jul 25, 2022
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" -X PUT "$SERVER_URL/api/datasets/:persistentId/$VERSION/archivalStatus?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
>>>>>>> refs/remotes/IQSS/develop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an unresolved merge conflict above

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flyway script version needs to be updated.

}

public boolean isSomeVersionArchived() {
for (DatasetVersion dv : dataset.getVersions()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for this method to be on the DatasetServiceBean?

@@ -5570,6 +5570,65 @@ public void archiveVersion(Long id) {
}
}
}

public boolean isArchivable() {
String className = settingsWrapper.getValueForKey(SettingsServiceBean.Key.ArchiverClassName, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for this method (and the next one) to be on the ArchiverUtil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller - Maybe? These, and the one above are all used in the dataset-versions.xhtml file so moving them completely to ArchiverUtil (or DatasetServiceBean above) means having to make those classes (with ArchiverUtil not being a bean) accessible in the xhtml and not being able to use settingsWrapper to cache the property values(again because , with ArchiverUtil is not a bean/ is called from the API where there is no view/session scope).

One way I could simplify a bit would be to shift the reflective call logic into ArchiverUtil, e.g. a getArchiverOption(String className, String methodToInvoke) method. That would reduce duplicate code and minimize what's in DatasetPage.

Does that make sense? Or is it worth moving them out of DatasetPage completely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just isSomeVersionArchived which is in the DatasetPage as well as Admin and Datasets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe it's not worth the trouble

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some quick de-duplication and push it back. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the guts of someVersionArchived to the util but the rest seem hard - there are methods that require a dataset and some that are static, some use settingsWrapper and some settingsServiceBean due to their context (UI or API) so as I tried to refactor I didn't see good opportunities to consolidate.

TDL/7493-improve_BagGnerator_failure_handling
@sekmiller sekmiller removed their assignment Aug 3, 2022
@pdurbin pdurbin self-assigned this Aug 3, 2022
@pdurbin
Copy link
Member

pdurbin commented Aug 3, 2022

I know at standup I said I'd move this over to QA but I just noticed the API tests didn't run so I just pushed a typo fix to enable another run.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tests are passing now so I'm hitting "Approve" (@sekmiller is the one who did the actual review).

@pdurbin pdurbin removed their assignment Aug 3, 2022
@scolapasta scolapasta added this to the 5.12 milestone Aug 5, 2022
@kcondon kcondon self-assigned this Aug 5, 2022
@kcondon kcondon merged commit 1e52809 into IQSS:develop Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 3a Harvard Data Commons Obj. 3A
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HDC 3A: Support archiving to an S3 bucket
6 participants