-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129
Comments
I believe that @tcoupin fixed this in the following pull request, which we just merged and will be available in the next version of Dataverse (5.13 as of this writing): By the way, thank you @philippconzett for the extensive write up! It's a lot to go through. Very thorough. 😄 |
2024/05/08
|
Another related issue: |
The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license.
@philippconzett (and any others watching this issue), I create a pull request to address the points you made above: Please take a look and feel free to leave comments or a review on the pull request. Thanks. |
@pdurbin Thanks! I just left a comment on the PR. |
* Remap oai_dc fields dc:type, dc:date, and dc:rights #8129. The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license. * add tests for export and citation date #8129 * map dc:date to pub date or field for citation date #8129 * back out of any changes to dc:rights #8129 * remove OAI-PMH changes from API changelog (also in release note) #8129 * tweak release note, mention backward incompatibility, reexport #8129
This issue was just closed because we merged the following pull request: As explained above, changes to dc:rights were not included in the scope of the pull request. Please look instead to these issues: |
Note: dc:rights is being handled in #5920 and #4176 but the original description of this issue has been preserved.
Based on a semi-systematic survey of how DataverseNO metadata is harvested in Bielefeld Academic Search Engine (BASE; https://www.base-search.net/Search/Advanced), a major search engine for research outputs, we have noticed some issues related to the way the Dataverse software provides Dublin Core metadata for OAI-PMH harvesting.
dc:type
BASE harvests multiple types of research output, e.g. publications and datasets. Searching BASE you can filter/limit the search result to only include datasets by selecting Dataset in the Document Type section of advanced search:
However, only very few metadata records harvested directly from DataverseNO are marked as Document Type = Dataset.
It seems that in the oai_dc format, which BASE uses for harvesting, Document Type is based on the dc:type field. According to the Dataverse Metadata Crosswalk, dc:type corresponds to the Dataverse metadata field Kind of Data. But this field may contain very different values, e.g., “survey data”, “survey”, “observations” etc. Dublin Core (see https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/type) recommends “to use a controlled vocabulary such as the DCMI Type Vocabulary” for dc:type. The DCMI Type Vocabulary has “dataset” as one of its values. I therefore suggest changing the Dataverse / DC Element (oai_dc) mapping, so that dc:type is hard-coded as “dataset” for all dataset metadata in Dataverse.
dc:date
The Dataverse metadata field Publication Date is available as dcterms:issued, but it doesn’t seem to be among the oai_dc fields Dataverse exposes for OAI-PMH harvesting. According to the Dataverse Metadata Crosswalk, dc:date corresponds to the Dataverse metadata field Deposit Date, but all the random samples I tested in BASE indicate that dc:date, which BASE uses as input for their metadata field Year of Publication, corresponds to the Dataverse field Date of Production. I suggest changing the Dataverse / DC Element (oai_dc) mapping, so that dc:date is mapped with Publication Date. This is also in line with citation recommendations. The publication date is the preferred date when citing research data; see, e.g., page 12 in The Tromsø Recommendations for Citation of Research Data in Linguistics; https://doi.org/10.15497/rda00040.
dc:rights
For some of the sources included in BASE, there is an indication of the degree of Open Access. Among them are some Dataverse-based repositories. On the other side, for DataverseNO and other Dataverse-based repositories, this information is not available / unknown (“unbekannt”):
The Open Access information in BASE is based on the Dublin Core field dc:rights. Dataverse does not provide the field dc:rights. A correct value in this field would enable BASE to indicate the degree of Open Access (see more information at https://www.base-search.net/about/en/faq_oai.php#dc-rights). For datasets without access restriction, the dc:rights field could look like this: info:eu-repo/semantics/openAccess (see more information at https://guidelines.openaire.eu/en/latest/data/field_rights.html#rightsuri-ma).
The text was updated successfully, but these errors were encountered: