Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataciteXML changes Plus RelationType field #10632

Open
wants to merge 120 commits into
base: develop
Choose a base branch
from

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jun 14, 2024

What this PR does / why we need it: This PR adds a RelationType child field to the related publication parent field and uses it to provide a RelationType in the OpenAire and DataCite XML exports, DataCite XML sent to dataset (and the JSON and OAI_ORE exports which include all fields). It builds upon #10615 and should be reviewed/QA'd after that (or we can create a PR against that branch to more easily see the changes just to add a RelationType.

Which issue(s) this PR closes:

Relates to:

Special notes for your reviewer:

Suggestions on how to test this: Nominally the new XMLTemplateTest (and all others) should pass and it should be possible to publish datasets with any/all metadata using a DataCite test account. The log shouldn't contain any issues where DataCite responds with a 422 and indicates that the XML doesn't comply with their 4.5 schema. There should be lots of additional metadata for related publications, author entries should include ORCID info if provided and affiliations and GrantNumberAgency should have ROR info if a ROR rather than plain text was entered. Typos like having a related publication with id type doi and either no or non-DOI entries for the identifier and url should result in a log message and that particular related Publication not getting included in the XML, but otherwise should not cause a failure to update the XML. Etc.

FWIW: I have been able to run this on all the QDR production data and have everything update OK (though we have a few typos in the metadata to fix).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Yes, it adds "Relation Type" to "Related Publication":

Screenshot 2024-09-17 at 2 40 17 PM
Screenshot 2024-09-17 at 2 40 23 PM

Is there a release notes update needed for this change?: included.

Additional documentation: As noted in the release note, there's a long doc listing ~all of the intended changes from the previous version - see https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing.

Changes to the guides can be previewed at https://dataverse-guide--10632.org.readthedocs.build/en/10632/admin/dataverses-datasets.html#send-metadata-to-pid-provider

@pdurbin pdurbin self-assigned this Sep 17, 2024
these were going through the default check for URLs and failing (not a
url) leading to a warning. The new code should try URL parsing for URLs,
try PID and URL parsing for ones with no type specified, and send the
rest of the identifiers w/o any additional (optional) attributes.
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4000 lines is a lot to review but here's some initial feedback.

doc/release-notes/10632-DataCiteXMLandRelationType.md Outdated Show resolved Hide resolved
doc/release-notes/10632-DataCiteXMLandRelationType.md Outdated Show resolved Hide resolved
doc/release-notes/10632-DataCiteXMLandRelationType.md Outdated Show resolved Hide resolved
doc/release-notes/10632-DataCiteXMLandRelationType.md Outdated Show resolved Hide resolved
doc/release-notes/10632-DataCiteXMLandRelationType.md Outdated Show resolved Hide resolved
src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java Outdated Show resolved Hide resolved
@qqmyers
Copy link
Member Author

qqmyers commented Sep 17, 2024

OK - I think I addressed all the comments.

@qqmyers qqmyers removed their assignment Sep 17, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 17, 2024

Jenkins is failing but I pushed a minor doc tweak to force another run.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts on the docs.


Additional metadata, including metadata about Related Publications is now being sent to DataCite when DOIs are registered and published and is available in the DataCite XML export. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed. The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema.

For details see https://github.com/IQSS/dataverse/pull/10632 and https://github.com/IQSS/dataverse/pull/10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real meat of what's changing is squirreled away in a Google doc. This doesn't quite sit right with me. 🤔 This is the stuff users might like to know. And the stuff QA will test against.

However, our release notes tend to get long and I'm not sure the details should be here either.

The more I think about it... I'd prefer to have the Google doc copied and pasted here into the release notes. Git is a much better way to preserve this information. And it keeps the info with the pull request.

I'm open to other ideas, of course. Perhaps a new changelog in the guides? Or throw it in the API changelog? A separate text file linked from the release notes and/or the guides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to focus on the 40+ changes, many of which are only relevant if you're comparing old, new, and OpenAIRE closely (and are really closer to per-commit changes we usually make). I've added some additional detail to the release note to try and give more of a sense of the scope of the change (v4.5 schema, files, license/terms info, PIDs, ...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this information should be in git but I give up.

Thanks for the additional information in the release note. It does help.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments on code.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API tests are passing. This is a lot to review and QA but I'm happy enough with how the code and docs look. This will be a great feature. People have been asking us to send more metadata to DataCite for years. Approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consider For Next Release A simple change (eg bug fix) that would be good to prioritize since it has been seen in the wild FY25 Sprint 5 FY25 sprint 5 FY25 Sprint 6 FY25 Sprint 6 GDCC: QDR of interest to QDR Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: Ready for QA ⏩
Development

Successfully merging this pull request may close these issues.

Align or merge DataCite metadata exports
5 participants