Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataciteXML changes Plus RelationType field #10632

Open
wants to merge 120 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
4878cfe
separate metadata parsing/params from XML generation code
qqmyers May 3, 2024
68792c2
extract some common xml writing util code
qqmyers May 3, 2024
1a46155
note duplicate method
qqmyers May 3, 2024
ace656c
remove xml template doc, refactor to generate xml, adding OA fields
qqmyers May 3, 2024
dba03e2
refactor source of XML info
qqmyers May 3, 2024
af3e24b
add code to get raw alphanumeric pid value
qqmyers May 3, 2024
fa23884
remove duplicate method
qqmyers May 3, 2024
0d22d6c
dates, resourceType, alternate Ids
qqmyers May 3, 2024
d69bf41
more methods
qqmyers May 8, 2024
04b367f
only one field to look for
qqmyers May 15, 2024
003431d
use common util method
qqmyers May 15, 2024
fea2f5e
access rights descriptions, geolocations, funding refs
qqmyers May 15, 2024
3c52b6a
altTitles npe
qqmyers May 17, 2024
bab2a0d
fixes and test
qqmyers May 18, 2024
3cca63d
fix for empty rel pub entry
qqmyers May 20, 2024
30c80a9
bugs: remove bad nesting, dupe values
qqmyers May 20, 2024
a2acdeb
add XML Validation to test
qqmyers May 20, 2024
3ec7a0b
fix contributorType
qqmyers May 23, 2024
842dee6
add geolocations element and multiple geolocation
qqmyers May 23, 2024
81a7c4a
typos
qqmyers May 23, 2024
ed5eab0
try execute inside the main method
qqmyers May 24, 2024
39673f0
Fix subject, keyword
qqmyers May 24, 2024
36097d6
fix geo coverage
qqmyers May 24, 2024
a5d3b3e
adjust funders to include grant number, add xml escaping for description
qqmyers May 24, 2024
8a12444
bug: add dataset descriptions
qqmyers May 24, 2024
f3e5dc1
typo, add xml escape for funder
qqmyers May 24, 2024
5610c95
still typo
qqmyers May 24, 2024
7148b03
mark contact as deprecated - unused
qqmyers May 24, 2024
0470459
more fixes
qqmyers May 24, 2024
c0265da
catch parseexception
qqmyers May 24, 2024
2ff8678
fix alternateIdentifier, related PID parsing, series
qqmyers May 24, 2024
182f3d7
catch PID update exception to avoid corrupt dataset
qqmyers May 24, 2024
be90355
try long sleep
qqmyers May 24, 2024
e458e8c
set dv released before pid publicize, go back to short time
qqmyers May 24, 2024
27fe7b4
always use latest version for copy
qqmyers May 24, 2024
00a3830
handle deaccession, fix relatedIDtype for files
qqmyers May 28, 2024
1faf0cd
missed assignment for title
qqmyers May 28, 2024
23dd581
fix creator for deaccessioned
qqmyers May 28, 2024
3bbd2e9
correct fix for creators when deaccessioned
qqmyers May 28, 2024
4def6da
remove bad value and lang
qqmyers May 28, 2024
eac477e
add creatorName sub element for deaccession/no names case
qqmyers May 28, 2024
154ac8a
typo
qqmyers May 28, 2024
9144f6c
fix resourceType - always 1 entry
qqmyers May 28, 2024
a5870fb
Also handle file case for resourceType
qqmyers May 28, 2024
24db2af
missed changes
qqmyers May 31, 2024
f0fd61a
simplify - util checks for null and empty
qqmyers May 31, 2024
ead153f
typo in DOI parsing logic
qqmyers Jun 10, 2024
ea75216
only files in latestversionforcopy
qqmyers Jun 10, 2024
33f8f30
Merge remote-tracking branch 'IQSS/develop' into datacite_xml_improve…
qqmyers Jun 14, 2024
b6bd530
fix date parsing, clear bad values
qqmyers Jun 11, 2024
e1383d7
relationType entry in citation block
qqmyers Jun 14, 2024
93faade
missing element for openaireutil test
qqmyers Jun 14, 2024
c9084e3
contributor type null fix
qqmyers Jun 14, 2024
cdd6d6f
add relationType to base code and DataCite XML
qqmyers Jun 14, 2024
360d3fa
add relationType to above fold display
qqmyers Jun 14, 2024
53ded9e
typos
qqmyers Jun 14, 2024
6aade3a
handle blank id no and styling in above fold summary
qqmyers Jun 14, 2024
347971f
skip blanks in geo place name entries
qqmyers Jun 17, 2024
bc5686b
use ; to separate kindOfData / resourceTypes
qqmyers Jun 17, 2024
db934bb
add Time Period as Other Date
qqmyers Jun 18, 2024
9efe597
support available and updated dates for dataset and file
qqmyers Jun 18, 2024
de37314
fix file updated logic
qqmyers Jun 18, 2024
0bbce02
Merge branch 'datacite_xml_improvements' into datacite_plus_relPubRel…
qqmyers Jun 19, 2024
6357d92
fix no relType styling
qqmyers Jun 18, 2024
c624c0c
handle null
qqmyers Jun 18, 2024
0c2cff1
add HasPart rels - logic issue
qqmyers Jun 24, 2024
a177a08
catch additional exception type
qqmyers Jun 24, 2024
9643b07
Merge branch 'datacite_xml_improvements' into
qqmyers Jun 26, 2024
fed231f
fix rendering with a span
qqmyers Jun 26, 2024
08843d8
missing imports, null check
qqmyers Jun 26, 2024
c4df868
Merge remote-tracking branch 'IQSS/develop' into datacite_xml_improve…
qqmyers Jun 26, 2024
394adcd
Merge branch 'datacite_xml_improvements' into
qqmyers Jun 26, 2024
1cceab1
Merge remote-tracking branch 'IQSS/develop' into datacite_xml_improve…
qqmyers Jul 19, 2024
33d123f
Merge branch 'datacite_xml_improvements' into datacite_plus_relPubRel…
qqmyers Jul 19, 2024
6c87b9e
fix ROR identification
qqmyers Jul 5, 2024
182cfdd
passthrough for ext cvv/ROR affiliation update
qqmyers Jul 9, 2024
f53dacf
Merge branch 'datacite_xml_improvements' into datacite_plus_relPubRel…
qqmyers Jul 21, 2024
e2b88ca
Merge branch 'datacite_xml_improvements' into
qqmyers Jul 5, 2024
2124ce6
add relation type values
qqmyers Jul 5, 2024
6e1919a
fix case
qqmyers Jul 8, 2024
567b111
Merge remote-tracking branch 'IQSS/develop' into
qqmyers Sep 3, 2024
d19e7f2
missing empty watermark entry
qqmyers Sep 3, 2024
5c041e4
fix capitalization
qqmyers Sep 3, 2024
f5326c9
Merge remote-tracking branch 'IQSS/develop' into
qqmyers Sep 3, 2024
ea373af
Merge remote-tracking branch 'IQSS/develop' into
qqmyers Sep 6, 2024
7e1f73b
Merge branch 'datacite_xml_improvements' into
qqmyers Sep 6, 2024
b432778
fix test
qqmyers Sep 6, 2024
4e298a3
update tests - added one field in citation block
qqmyers Sep 9, 2024
3d6cd3a
release note
qqmyers Sep 9, 2024
4f539ba
support no pubIdType for URLs
qqmyers Sep 10, 2024
25d63a7
direct people to the log for failures - they aren't in the response
qqmyers Sep 10, 2024
92bf051
bug - the _target url isn't being set elsewhere
qqmyers Sep 10, 2024
f4c5164
avoid failing when the entity is null for error statuses
qqmyers Sep 10, 2024
45156be
don't update unpublished files - no need and it will fail
qqmyers Sep 10, 2024
5998e7a
lower logging, add null check on relatedIdentifier
qqmyers Sep 10, 2024
3ef9557
Change to use POST for all
qqmyers Sep 10, 2024
4e7d22d
Documentation and updated release note
qqmyers Sep 10, 2024
165bb9d
Merge remote-tracking branch 'IQSS/develop' into datacite_plus_relPub…
qqmyers Sep 11, 2024
1865a81
test fix - number of fields
qqmyers Sep 11, 2024
bb7adee
update release note
qqmyers Sep 12, 2024
bde5147
check for ROR in grantAgency field too
qqmyers Sep 12, 2024
280ed49
adopt using CDI, fix funderIdentifier element per schema
qqmyers Sep 12, 2024
f255f19
release note/changelog changes
qqmyers Sep 12, 2024
194dae8
don't send contributors w/o contributorType
qqmyers Sep 13, 2024
35ff432
relatedIdentifierType is required
qqmyers Sep 13, 2024
6c8f73e
flip to prefer identifier over url
qqmyers Sep 14, 2024
86aec68
Handle case where type is set but there's no identifier
qqmyers Sep 14, 2024
203add1
map non-standard contributors to Other, remove unused imports
qqmyers Sep 14, 2024
b06e620
Treat missing contrib type as Other
qqmyers Sep 14, 2024
41c9b29
avoid spurious log warning for others e.g. isbn
qqmyers Sep 17, 2024
6a05bef
changes per review
qqmyers Sep 17, 2024
24a1bdf
Merge remote-tracking branch 'IQSS/develop' into datacite_plus_relPub…
qqmyers Sep 17, 2024
174fcf9
Apply suggestions from code review
qqmyers Sep 17, 2024
088b735
Merge branch 'datacite_plus_relPubRelType' of https://github.com/Qual…
qqmyers Sep 17, 2024
4ae0599
cleaner formatting
qqmyers Sep 17, 2024
8141a78
minor doc tweak #10632
pdurbin Sep 17, 2024
c6dd220
No longer needed with use of CDI.current() in XMLMetadataTemplate
qqmyers Sep 17, 2024
b1e5020
no longer used and CrossRef ended up using it's own.
qqmyers Sep 17, 2024
0c80b2c
Merge branch 'datacite_plus_relPubRelType' of https://github.com/Qual…
qqmyers Sep 17, 2024
87bd308
add more info about the scope of changes.
qqmyers Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@
<field name="productionPlace" type="text_en" multiValued="true" stored="true" indexed="true"/>
qqmyers marked this conversation as resolved.
Show resolved Hide resolved
<field name="publication" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationCitation" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationRelationType" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationIDNumber" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationIDType" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="publicationURL" type="text_en" multiValued="true" stored="true" indexed="true"/>
Expand Down Expand Up @@ -593,6 +594,7 @@
<copyField source="productionPlace" dest="_text_" maxChars="3000"/>
<copyField source="publication" dest="_text_" maxChars="3000"/>
<copyField source="publicationCitation" dest="_text_" maxChars="3000"/>
<copyField source="publicationRelationType" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDNumber" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDType" dest="_text_" maxChars="3000"/>
<copyField source="publicationURL" dest="_text_" maxChars="3000"/>
Expand Down
41 changes: 41 additions & 0 deletions doc/release-notes/10632-DataCiteXMLandRelationType.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
### Enhanced DataCite Metadata, Relation Type

A new field has been added to the citation metadatablock to allow entry of the "Relation Type" between a "Related Publication" and a dataset. The Relation Type is currently limited to the most common 6 values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed.

Dataverse now supports the DataCite v4.5 schema. Additional metadata, including metadata about Related Publications, and files in the dataset are now being sent to DataCite and improvements to how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata is represented have been made. The enhanced metadata will automatically be sent when datasets are created and published and is available in the DataCite XML export after publication.

The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see https://github.com/IQSS/dataverse/pull/10632 and https://github.com/IQSS/dataverse/pull/10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there.

Multiple backward incompatible changes and bug fixes have been made to API calls (3 of the four of which were not documented) related to updating PID target urls and metadata at the provider service:
- [Update Target URL for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-a-published-dataset-at-the-pid-provider)
- [Update Target URL for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-target-url-for-all-published-datasets-at-the-pid-provider)
- [Update Metadata for a Published Dataset at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-a-published-dataset-at-the-pid-provider)
- [Update Metadata for all Published Datasets at the PID provider](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#update-metadata-for-all-published-datasets-at-the-pid-provider)

Upgrade instructions
--------------------

The Solr schema has to be updated via the normal mechanism to add the new "relationType" field.

The citation metadatablock has to be reinstalled using the standard instructions.

With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite.

To update existing datasets (and files using DataCite DOIs):

Exports can be updated by running `curl http://localhost:8080/api/admin/metadata/reExportAll`

Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):

`curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll`

This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using

`grep 'Failure for id' server.log`

Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the [Reserve a PID](https://guides.dataverse.org/en/latest/api/native-api.html#reserve-a-pid) API call and the newly documented `/api/datasets/<id>/modifyRegistration` call respectively. See https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.

PIDs can also be updated by a superuser on a per-dataset basis using

`curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata`

35 changes: 32 additions & 3 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,41 @@ Mints a new identifier for a dataset previously registered with a handle. Only a

.. _send-metadata-to-pid-provider:

Send Dataset metadata to PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Update Target URL for a Published Dataset at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forces update to metadata provided to the PID provider of a published dataset. Only accessible to superusers. ::
Forces update to the target URL provided to the PID provider of a published dataset and assures the PID is findable.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$dataset-id/modifyRegistration

Update Target URL for all Published Datasets at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forces update to the target URL provided to the PID provider of all published datasets and assures the PID is findable.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/modifyRegistrationAll

Update Metadata for a Published Dataset at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Checks to see that the PID metadata for a published dataset (and any released files in it using file PIDs)
is up-to-date at the provider and updates the metadata if necessary.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$dataset-id/modifyRegistrationMetadata

Update Metadata for all Published Datasets at the PID provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Checks to see that the PID metadata is up-to-date at the provider for all published datasets
(and any released files in them using file PIDs) and updates the metadata if necessary.
Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/modifyRegistrationPIDMetadataAll

The call returns 200/OK as long as the call completes. Any errors for individual datasets are reported in the log.
pdurbin marked this conversation as resolved.
Show resolved Hide resolved

Check for Unreserved PIDs and Reserve Them
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ This API changelog is experimental and we would love feedback on its usefulness.
:local:
:depth: 1
qqmyers marked this conversation as resolved.
Show resolved Hide resolved

v6.4
----

- **/api/datasets/$dataset-id/modifyRegistration**: Changed from GET to POST
- **/api/datasets/modifyRegistrationPIDMetadataAll**: Changed from GET to POST


v6.3
----

Expand Down
12 changes: 8 additions & 4 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,10 @@ Dataverse can be configured with one or more PID providers, each of which can mi
to manage an authority/shoulder combination, aka a "prefix" (PermaLinks also support custom separator characters as part of the prefix),
along with an optional list of individual PIDs (with different authority/shoulders) than can be managed with that account.

Dataverse automatically manages assigning PIDs and making them findable when datasets are published. There are also :ref:`API calls that
allow updating the PID target URLs and metadata of already-published datasets manually if needed <send-metadata-to-pid-provider>`, e.g. if a Dataverse instance is
moved to a new URL or when the software is updated to generate additional metadata or address schema changes at the PID service.

Testing PID Providers
+++++++++++++++++++++

Expand All @@ -246,11 +250,11 @@ configure the credentials as described below.

Alternately, you may wish to configure other providers for testing:

- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite.
- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite.

- The PermaLink provider, like the FAKE DOI provider, does not involve an external account.
Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs,
and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.
- The PermaLink provider, like the FAKE DOI provider, does not involve an external account.
Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs,
and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.

Provider-specific configuration is described below.

Expand Down
6 changes: 6 additions & 0 deletions scripts/api/data/dataset-create-new-all-default-fields.json
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,12 @@
"typeClass": "compound",
"value": [
{
"publicationRelationType" : {
"typeName" : "publicationRelationType",
"multiple" : false,
"typeClass" : "controlledVocabulary",
"value" : "IsSupplementTo"
},
"publicationCitation": {
"typeName": "publicationCitation",
"multiple": false,
Expand Down
Loading
Loading