Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remap oai_dc fields dc:type and dc:date #10737

Merged
merged 9 commits into from
Sep 11, 2024
18 changes: 18 additions & 0 deletions doc/release-notes/8129-harvesting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
### Remap oai_dc export and harvesting format fields: dc:type and dc:date

The `oai_dc` export and harvesting format has had the following fields remapped:

- dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset".
- dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field "Publication Date" or the field used for the citation date, if set (see [Set Citation Date Field Type for a Dataset](https://guides.dataverse.org/en/6.3/api/native-api.html#set-citation-date-field-type-for-a-dataset)).

In order for these changes to be reflected in existing datasets, a [reexport all](https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api) should be run.

For more information, please see #8129 and #10737.

### Backward incompatible changes

See the "Remap oai_dc export" section above.

### Upgrade instructions

In order for changes to the `oai_dc` metadata export format to be reflected in existing datasets, a [reexport all](https://guides.dataverse.org/en/6.3/admin/metadataexport.html#batch-exports-through-the-api) should be run.
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1836,6 +1836,8 @@ The fully expanded example above (without environment variables) looks like this

.. note:: You cannot deaccession a dataset more than once. If you call this endpoint twice for the same dataset version, you will get a not found error on the second call, since the dataset you are looking for will no longer be published since it is already deaccessioned.

.. _set-citation-date-field:

Set Citation Date Field Type for a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

import com.google.gson.Gson;
import edu.harvard.iq.dataverse.DatasetFieldConstant;
import edu.harvard.iq.dataverse.DatasetFieldType;
import edu.harvard.iq.dataverse.DatasetServiceBean;
import edu.harvard.iq.dataverse.GlobalId;
import edu.harvard.iq.dataverse.api.dto.DatasetDTO;
import edu.harvard.iq.dataverse.api.dto.DatasetVersionDTO;
Expand Down Expand Up @@ -176,22 +178,41 @@ private static void createOAIDC(XMLStreamWriter xmlw, DatasetDTO datasetDto, Str

writeFullElementList(xmlw, dcFlavor+":"+"language", dto2PrimitiveList(version, DatasetFieldConstant.language));

String date = dto2Primitive(version, DatasetFieldConstant.productionDate);
if (date == null) {
date = datasetDto.getPublicationDate();
/**
* dc:date. "I suggest changing the Dataverse / DC Element (oai_dc)
* mapping, so that dc:date is mapped with Publication Date. This is
* also in line with citation recommendations. The publication date is
* the preferred date when citing research data; see, e.g., page 12 in
* The Tromsø Recommendations for Citation of Research Data in
* Linguistics; https://doi.org/10.15497/rda00040 ." --
* https://github.com/IQSS/dataverse/issues/8129
*
* However, if the citation date field has been set, use that.
*/
String date = datasetDto.getPublicationDate();
DatasetFieldType citationDataType = jakarta.enterprise.inject.spi.CDI.current().select(DatasetServiceBean.class).get().findByGlobalId(globalId.asString()).getCitationDateDatasetFieldType();
if (citationDataType != null) {
date = dto2Primitive(version, citationDataType.getName());
}
writeFullElement(xmlw, dcFlavor+":"+"date", date);

writeFullElement(xmlw, dcFlavor+":"+"date", date);

writeFullElement(xmlw, dcFlavor+":"+"contributor", dto2Primitive(version, DatasetFieldConstant.depositor));

writeContributorElement(xmlw, version, dcFlavor);

writeFullElementList(xmlw, dcFlavor+":"+"relation", dto2PrimitiveList(version, DatasetFieldConstant.relatedDatasets));

writeFullElementList(xmlw, dcFlavor+":"+"type", dto2PrimitiveList(version, DatasetFieldConstant.kindOfData));
/**
* dc:type. "Dublin Core (see
* https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/type
* ) recommends “to use a controlled vocabulary such as the DCMI Type
* Vocabulary” for dc:type." So we hard-coded it to "Dataset". See
* https://github.com/IQSS/dataverse/issues/8129
*/
writeFullElement(xmlw, dcFlavor+":"+"type", "Dataset");

writeFullElementList(xmlw, dcFlavor+":"+"source", dto2PrimitiveList(version, DatasetFieldConstant.dataSources));


}

Expand Down
88 changes: 83 additions & 5 deletions src/test/java/edu/harvard/iq/dataverse/api/DatasetsIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -630,8 +630,7 @@ public void testCreatePublishDestroyDataset() {
Response exportDatasetAsDublinCore = UtilIT.exportDataset(datasetPersistentId, "oai_dc", apiToken);
exportDatasetAsDublinCore.prettyPrint();
exportDatasetAsDublinCore.then().assertThat()
// FIXME: Get this working. See https://github.com/rest-assured/rest-assured/wiki/Usage#example-3---complex-parsing-and-validation
// .body("oai_dc:dc.find { it == 'dc:title' }.item", hasItems("Darwin's Finches"))
.body("oai_dc.title", is("Darwin's Finches"))
.statusCode(OK.getStatusCode());

Response exportDatasetAsDdi = UtilIT.exportDataset(datasetPersistentId, "ddi", apiToken);
Expand Down Expand Up @@ -1195,8 +1194,7 @@ public void testExport() {
Response exportDatasetAsDublinCore = UtilIT.exportDataset(datasetPersistentId, "oai_dc", apiToken);
exportDatasetAsDublinCore.prettyPrint();
exportDatasetAsDublinCore.then().assertThat()
// FIXME: Get this working. See https://github.com/rest-assured/rest-assured/wiki/Usage#example-3---complex-parsing-and-validation
// .body("oai_dc:dc.find { it == 'dc:title' }.item", hasItems("Darwin's Finches"))
.body("oai_dc.title", is("Dataset One"))
.statusCode(OK.getStatusCode());

Response exportDatasetAsDdi = UtilIT.exportDataset(datasetPersistentId, "ddi", apiToken);
Expand Down Expand Up @@ -4103,7 +4101,87 @@ public void getDatasetVersionCitation() {
.assertThat().body("data.message", containsString(String.valueOf(persistentId)));
}


@Test
public void testCitationDate() throws IOException {

Response createUser = UtilIT.createRandomUser();
createUser.then().assertThat().statusCode(OK.getStatusCode());
String username = UtilIT.getUsernameFromResponse(createUser);
String apiToken = UtilIT.getApiTokenFromResponse(createUser);

Response createDataverse = UtilIT.createRandomDataverse(apiToken);
createDataverse.then().assertThat().statusCode(CREATED.getStatusCode());
String dataverseAlias = UtilIT.getAliasFromResponse(createDataverse);
Integer dataverseId = UtilIT.getDataverseIdFromResponse(createDataverse);
Response createDataset = UtilIT.createRandomDatasetViaNativeApi(dataverseAlias, apiToken);
createDataset.then().assertThat().statusCode(CREATED.getStatusCode());
Integer datasetId = UtilIT.getDatasetIdFromResponse(createDataset);
String datasetPid = JsonPath.from(createDataset.getBody().asString()).getString("data.persistentId");

Path pathToAddDateOfDepositJson = Paths.get(java.nio.file.Files.createTempDirectory(null) + File.separator + "dateOfDeposit.json");
String dateOfDeposit = """
{
"fields": [
{
"typeName": "dateOfDeposit",
"value": "1999-12-31"
}
]
}
""";
java.nio.file.Files.write(pathToAddDateOfDepositJson, dateOfDeposit.getBytes());

Response addDateOfDeposit = UtilIT.addDatasetMetadataViaNative(datasetPid, pathToAddDateOfDepositJson.toString(), apiToken);
addDateOfDeposit.prettyPrint();
addDateOfDeposit.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data.metadataBlocks.citation.fields[5].value", equalTo("1999-12-31"));

Response setCitationDate = UtilIT.setDatasetCitationDateField(datasetPid, "dateOfDeposit", apiToken);
setCitationDate.prettyPrint();
setCitationDate.then().assertThat().statusCode(OK.getStatusCode());

UtilIT.publishDataverseViaNativeApi(dataverseAlias, apiToken);
UtilIT.publishDatasetViaNativeApi(datasetId, "major", apiToken).then().assertThat().statusCode(OK.getStatusCode());

Response getCitationAfter = UtilIT.getDatasetVersionCitation(datasetId, DS_VERSION_LATEST_PUBLISHED, true, apiToken);
getCitationAfter.prettyPrint();

String doi = datasetPid.substring(4);

// Note that the year 1999 appears in the citation because we
// set the citation date field to a field that has that year.
String expectedCitation = "Finch, Fiona, 1999, \"Darwin's Finches\", <a href=\"https://doi.org/" + doi + "\" target=\"_blank\">https://doi.org/" + doi + "</a>, Root, V1";

getCitationAfter.then().assertThat()
.statusCode(OK.getStatusCode())
.body("data.message", is(expectedCitation));

Response exportDatasetAsDublinCore = UtilIT.exportDataset(datasetPid, "oai_dc", apiToken);
exportDatasetAsDublinCore.prettyPrint();
exportDatasetAsDublinCore.then().assertThat()
.body("oai_dc.type", equalTo("Dataset"))
.body("oai_dc.date", equalTo("1999-12-31"))
.statusCode(OK.getStatusCode());

Response clearDateField = UtilIT.clearDatasetCitationDateField(datasetPid, apiToken);
clearDateField.prettyPrint();
clearDateField.then().assertThat().statusCode(OK.getStatusCode());

// Clearing not enough. You have to reexport because the previous date is cached.
Response rexport = UtilIT.reexportDatasetAllFormats(datasetPid);
rexport.prettyPrint();
rexport.then().assertThat().statusCode(OK.getStatusCode());

String todayDate = LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd"));
Response exportPostClear = UtilIT.exportDataset(datasetPid, "oai_dc", apiToken);
exportPostClear.prettyPrint();
exportPostClear.then().assertThat()
.body("oai_dc.type", equalTo("Dataset"))
.body("oai_dc.date", equalTo(todayDate))
.statusCode(OK.getStatusCode());
}

@Test
public void getVersionFiles() throws IOException, InterruptedException {
Response createUser = UtilIT.createRandomUser();
Expand Down
27 changes: 27 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -3717,6 +3717,33 @@ static Response getDatasetVersionCitation(Integer datasetId, String version, boo
return response;
}

static Response setDatasetCitationDateField(String datasetIdOrPersistentId, String dateField, String apiToken) {
String idInPath = datasetIdOrPersistentId; // Assume it's a number.
String optionalQueryParam = ""; // If idOrPersistentId is a number we'll just put it in the path.
if (!NumberUtils.isCreatable(datasetIdOrPersistentId)) {
idInPath = ":persistentId";
optionalQueryParam = "?persistentId=" + datasetIdOrPersistentId;
}
Response response = given()
.header(API_TOKEN_HTTP_HEADER, apiToken)
.body(dateField)
.put("/api/datasets/" + idInPath + "/citationdate" + optionalQueryParam);
return response;
}

static Response clearDatasetCitationDateField(String datasetIdOrPersistentId, String apiToken) {
String idInPath = datasetIdOrPersistentId; // Assume it's a number.
String optionalQueryParam = ""; // If idOrPersistentId is a number we'll just put it in the path.
if (!NumberUtils.isCreatable(datasetIdOrPersistentId)) {
idInPath = ":persistentId";
optionalQueryParam = "?persistentId=" + datasetIdOrPersistentId;
}
Response response = given()
.header(API_TOKEN_HTTP_HEADER, apiToken)
.delete("/api/datasets/" + idInPath + "/citationdate" + optionalQueryParam);
return response;
}

static Response getFileCitation(Integer fileId, String datasetVersion, String apiToken) {
Boolean includeDeaccessioned = null;
return getFileCitation(fileId, datasetVersion, includeDeaccessioned, apiToken);
Expand Down
Loading