Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10982 10909 Allow using OAI-PMH identifiers as persistent ids of harvested datasets #11049

Merged
merged 17 commits into from
Nov 27, 2024

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Nov 25, 2024

What this PR does / why we need it:

This PR reconciles the 2 implementations of the same feature, mine and @stevenferey 's, hence 2 issue numbers in the PR name. It will only close one of them, #10982; the other issue needs more work on other features being added as part of it and it's going back into "on hold" for now.

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Below is a real life example of an OAI-PMH archive subject to the feature added in this PR.

Create a harvesting client as follows:

harvest url: https://data.progedo.fr/oai
set: None
metadata format: oai_dc
archive type: Generic OAI archive

Most (or all?) of their records will be failing to import when testing with the develop branch prior to this PR. All (or most) should successfully import when testing this branch. As of writing this, having tested just now:

Screen Shot 2024-11-25 at 10 20 33 AM

since this is a real, active archive, their holdings are subject to ongoing change; so, your results may vary.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

Preview docs at https://dataverse-guide--11049.org.readthedocs.build/en/11049/api/native-api.html#create-a-harvesting-client

@coveralls
Copy link

coveralls commented Nov 25, 2024

Coverage Status

coverage: 22.486% (+0.04%) from 22.447%
when pulling 40fe665 on 10909-oai-identifiers-as-pids
into f95c1a0 on develop.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@landreev landreev added the Size: 10 A percentage of a sprint. 7 hours. label Nov 25, 2024
@landreev landreev added this to the 6.5 milestone Nov 25, 2024
@landreev landreev marked this pull request as ready for review November 25, 2024 15:23
@pdurbin pdurbin changed the title 10982 10909 Allow using OAI-PMH identifiers as persistent ids of harvested datasets Allow using OAI-PMH identifiers as persistent ids of harvested datasets Nov 25, 2024
@pdurbin pdurbin self-assigned this Nov 25, 2024
@pdurbin pdurbin changed the title Allow using OAI-PMH identifiers as persistent ids of harvested datasets 10982 10909 Allow using OAI-PMH identifiers as persistent ids of harvested datasets Nov 25, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great. Here's some initial feedback.

landreev and others added 2 commits November 25, 2024 11:07
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

This comment has been minimized.

1 similar comment

This comment has been minimized.

…added when resolving a conflict with a cherry-picked commit, which of course changes the checksum)

This comment has been minimized.

@landreev landreev assigned pdurbin and unassigned pdurbin and landreev Nov 25, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the code but it looks good (docs too) and the "how to test" instructions seem quite clear. Approved.

@pdurbin pdurbin removed their assignment Nov 25, 2024

This comment has been minimized.

@ofahimIQSS ofahimIQSS self-assigned this Nov 26, 2024
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10909-oai-identifiers-as-pids
ghcr.io/gdcc/configbaker:10909-oai-identifiers-as-pids

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@ofahimIQSS
Copy link
Contributor

tested and passed in internal - 1670 Files were successfully harvested
Testing of 11049.docx

@ofahimIQSS ofahimIQSS merged commit 3c427c1 into develop Nov 27, 2024
12 checks passed
@ofahimIQSS ofahimIQSS deleted the 10909-oai-identifiers-as-pids branch November 27, 2024 15:08
@ofahimIQSS ofahimIQSS removed their assignment Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Feature Request: Request identifier support for OAI_DC harvesting
4 participants