Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add import for oai_datacite ("OpenAire") format (this will allow Dataverse to harvest this format too) #7727

Open
tjouneau opened this issue Mar 26, 2021 · 4 comments

Comments

@tjouneau
Copy link

Hi
Feature : offer compatibility/possibility to harvest in Datacite format(s).
Currently a list is displayed and only oai_dc seems to work.
image

This is a follow-up to an out-of-topic discussion with @landreev which happened in issue #7638 and which I'm reproducing below.

As for the other formats, I don't really know what "oai_datacite4" is. It is safe to say that of all the (10?) formats they are offering (https://www.zenodo.org/oai2d?verb=ListMetadataFormats) oai_dc is the only one Dataverse understands. I am surprised that we are allowing a user to select an unsupported metadata format in the Harvesting Client config. (I thought we were dropping any unsupported formats from the list).
Looking at an example of oai_datacite4 (https://www.zenodo.org/oai2d?verb=GetRecord&identifier=oai:zenodo.org:204063&metadataPrefix=oai_datacite4), it appears to be simple enough. So it should be very doable to add support for it. But yes, that would definitely need to be handled in a separate issue.

Relevant use cases :
I think it is a use case relevant to the ongoing discussion happening in the MD WG about DV as registry (cf. MIT)

  • Datacite is the reference metadata format for multidisciplinary research data
  • Harvesting metadata which is the best and most complete possible for dataset representations in Dataverse is important might fuel other use cases such as dataset lists exports (for evaluation/assessment reports, bibliographies, and s.o.)

Possible problem :

  • Datacite is a fast-evolving format. Getting back to Zenodo they expose no less than 6 different flavours of Datacite metadata through their OAI server. Backward compatibility, proper mapping... have probably to be assessed.

  • ????

Thanks in advance for your time.
Thomas

@landreev
Copy link
Contributor

This could be a useful/popular format to support, I agree.
Let's investigate this. From looking at the examples briefly, the format may be a bit richer/have a few more fields than plain DC. But I still think all their fields can be mapped to something in the DDI that we already know how to import. So rather than write new import code, it may be easier to provide an XSLT transform from this format to DDI, and then import that. The OAI code library we are using has mechanisms for doing that in real time.

(We should also add a simple filter to strip the unsupported formats from the list in the harvesting configuration menu; allowing users to select something that's not going to work is bad).

@mreekie mreekie added NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... and removed NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... labels Oct 25, 2022
@mreekie
Copy link

mreekie commented Nov 1, 2022

Notes - discussion today:

  • subgroup on metadata is looking like it's going to go with a subset of datacite format.
  • From the description this is a metadata import issue as opposed to harvesting.

@landreev landreev changed the title Feature request : allow harvesting in oai_datacite format(s) Add import for oai_datacite ("OpenAire") format (this will allow Dataverse to harvest this format too) Nov 1, 2022
@pdurbin pdurbin added Feature: API User Role: API User Makes use of APIs Type: Feature a feature request labels Oct 9, 2023
@pdurbin
Copy link
Member

pdurbin commented Oct 29, 2024

I was just in a CAFE/RAPID/DesignSafe meeting and we talked about how we know that the "oai_dc" format works for harvesting from DataCite into Dataverse. There's a related issue about that here:

However, "oai_dc" is somewhat limited in the number fields it supports (only 15 or so).

This issue #7727 seems to be about adding the ability to import or harvest the "oai_datacite" format as well. It already appears in the dropdown as well as a format called "Datacite":

Screenshot 2024-10-29 at 4 17 06 PM

The harvesting formats that I've heard work fine are "oai_dc", "oai_ddi" and "dataverse_json".

It sounds like "oai_datacite" doesn't work. I don't know if "Datacite" format works or not.

@cmbz cmbz added the GREI 3 Search and Browse label Nov 4, 2024
@cmbz
Copy link

cmbz commented Nov 4, 2024

2024/11/04: Adding to GREI harvesting improvements list: IQSS/dataverse-pm#171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants