Skip to content

Commit

Permalink
Merge branch 'IQSS:develop' into 9683_get_dataset_api_in_single_query
Browse files Browse the repository at this point in the history
  • Loading branch information
ErykKul committed Jun 28, 2023
2 parents 2968ba2 + 8b4100d commit c8a1352
Show file tree
Hide file tree
Showing 20 changed files with 398 additions and 17 deletions.
3 changes: 3 additions & 0 deletions doc/release-notes/8889-filepids-in-collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the [:FilePIDsEnabled](https://guides.dataverse.org/en/latest/installation/config.html#filepidsenabled) section of the Configuration guide for details.
25 changes: 21 additions & 4 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,15 +153,32 @@ Mint a PID for a File That Does Not Have One
In the following example, the database id of the file is 42::

export FILE_ID=42
curl http://localhost:8080/api/admin/$FILE_ID/registerDataFile
curl "http://localhost:8080/api/admin/$FILE_ID/registerDataFile"

Mint PIDs for Files That Do Not Have Them
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mint PIDs for all unregistered published files in the specified collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you have a large number of files, you might want to consider miniting PIDs for files individually using the ``registerDataFile`` endpoint above in a for loop, sleeping between each registration::
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::

curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}"

It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well. File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

This API will sleep for 1 second between registration calls by default. A longer sleep interval can be specified with an optional ``sleep=`` parameter::

curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}?sleep=5"

Mint PIDs for ALL unregistered files in the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will attempt to register the PIDs for all the published files in your instance that do not yet have them::

curl http://localhost:8080/api/admin/registerDataFileAll

The application will attempt to sleep for 1 second between registration attempts as not to overload your persistent identifier service provider. Note that if you have a large number of files that need to be registered in your Dataverse, you may want to consider minting file PIDs within indivdual collections, or even for individual files using the ``registerDataFiles`` and/or ``registerDataFile`` endpoints above in a loop, with a longer sleep interval between calls.



Mint a New DOI for a Dataset with a Handle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
18 changes: 18 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -738,6 +738,24 @@ The fully expanded example above (without environment variables) looks like this
curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/guestbookResponses?guestbookId=1 -o myResponses.csv
.. _collection-attributes-api:

Change Collection Attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block::
curl -X PUT -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/attribute/$ATTRIBUTE?value=$VALUE"
The following attributes are supported:

* ``alias`` Collection alias
* ``name`` Name
* ``description`` Description
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).


Datasets
--------

Expand Down
5 changes: 3 additions & 2 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2766,13 +2766,14 @@ timestamps.
:FilePIDsEnabled
++++++++++++++++

Toggles publishing of file-based PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.
Toggles publishing of file-level PIDs for the entire installation. By default this setting is absent and Dataverse Software assumes it to be true. If enabled, the registration will be performed asynchronously (in the background) during publishing of a dataset.

If you don't want to register file-based PIDs for your installation, set:

``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled``

Note: File-level PID registration was added in Dataverse Software 4.9; it could not be disabled until Dataverse Software 4.9.3.

It is possible to override the installation-wide setting for specific collections. For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See :ref:`collection-attributes-api` for details.

.. _:IndependentHandleService:

Expand Down
12 changes: 12 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,18 @@ public List<DataFile> findByDatasetId(Long studyId) {
.setParameter("studyId", studyId).getResultList();
}

/**
*
* @param collectionId numeric id of the parent collection ("dataverse")
* @return list of files in the datasets that are *direct* children of the collection specified
* (i.e., no datafiles in sub-collections of this collection will be included)
*/
public List<DataFile> findByDirectCollectionOwner(Long collectionId) {
String queryString = "select f from DataFile f, Dataset d where f.owner.id = d.id and d.owner.id = :collectionId order by f.id";
return em.createQuery(queryString, DataFile.class)
.setParameter("collectionId", collectionId).getResultList();
}

public List<DataFile> findAllRelatedByRootDatafileId(Long datafileId) {
/*
Get all files with the same root datafile id
Expand Down
28 changes: 27 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/Dataverse.java
Original file line number Diff line number Diff line change
Expand Up @@ -590,8 +590,34 @@ public void setCitationDatasetFieldTypes(List<DatasetFieldType> citationDatasetF
this.citationDatasetFieldTypes = citationDatasetFieldTypes;
}


/**
* @Note: this setting is Nullable, with {@code null} indicating that the
* desired behavior is not explicitly configured for this specific collection.
* See the comment below.
*/
@Column(nullable = true)
private Boolean filePIDsEnabled;

/**
* Specifies whether the PIDs for Datafiles should be registered when publishing
* datasets in this Collection, if the behavior is explicitly configured.
* @return {@code Boolean.TRUE} if explicitly enabled, {@code Boolean.FALSE} if explicitly disabled.
* {@code null} indicates that the behavior is not explicitly defined, in which
* case the behavior should follow the explicit configuration of the first
* direct ancestor collection, or the instance-wide configuration, if none
* present.
* @Note: If present, this configuration therefore by default applies to all
* the sub-collections, unless explicitly overwritten there.
* @author landreev
*/
public Boolean getFilePIDsEnabled() {
return filePIDsEnabled;
}

public void setFilePIDsEnabled(boolean filePIDsEnabled) {
this.filePIDsEnabled = filePIDsEnabled;
}

public List<DataverseFacet> getDataverseFacets() {
return getDataverseFacets(false);
}
Expand Down
90 changes: 89 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Admin.java
Original file line number Diff line number Diff line change
Expand Up @@ -1376,7 +1376,7 @@ public Response fixMissingOriginalTypes() {
"All the tabular files in the database already have the original types set correctly; exiting.");
} else {
for (Long fileid : affectedFileIds) {
logger.info("found file id: " + fileid);
logger.fine("found file id: " + fileid);
}
info.add("message", "Found " + affectedFileIds.size()
+ " tabular files with missing original types. Kicking off an async job that will repair the files in the background.");
Expand Down Expand Up @@ -1566,6 +1566,12 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
} catch (Exception e) {
logger.info("Unexpected Exception: " + e.getMessage());
}

try {
Thread.sleep(1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
}
logger.info("Final Results:");
logger.info(alreadyRegistered + " of " + count + " files were already registered. " + new Date());
Expand All @@ -1577,6 +1583,88 @@ public Response registerDataFileAll(@Context ContainerRequestContext crc) {
return ok("Datafile registration complete." + successes + " of " + released
+ " unregistered, published files registered successfully.");
}

@GET
@AuthRequired
@Path("/registerDataFiles/{alias}")
public Response registerDataFilesInCollection(@Context ContainerRequestContext crc, @PathParam("alias") String alias, @QueryParam("sleep") Integer sleepInterval) {
Dataverse collection;
try {
collection = findDataverseOrDie(alias);
} catch (WrappedResponse r) {
return r.getResponse();
}

AuthenticatedUser superuser = authSvc.getAdminUser();
if (superuser == null) {
return error(Response.Status.INTERNAL_SERVER_ERROR, "Cannot find the superuser to execute /admin/registerDataFiles.");
}

if (!systemConfig.isFilePIDsEnabledForCollection(collection)) {
return ok("Registration of file-level pid is disabled in collection "+alias+"; nothing to do");
}

List<DataFile> dataFiles = fileService.findByDirectCollectionOwner(collection.getId());
Integer count = dataFiles.size();
Integer countSuccesses = 0;
Integer countAlreadyRegistered = 0;
Integer countReleased = 0;
Integer countDrafts = 0;

if (sleepInterval == null) {
sleepInterval = 1;
} else if (sleepInterval.intValue() < 1) {
return error(Response.Status.BAD_REQUEST, "Invalid sleep interval: "+sleepInterval);
}

logger.info("Starting to register: analyzing " + count + " files. " + new Date());
logger.info("Only unregistered, published files will be registered.");



for (DataFile df : dataFiles) {
try {
if ((df.getIdentifier() == null || df.getIdentifier().isEmpty())) {
if (df.isReleased()) {
countReleased++;
DataverseRequest r = createDataverseRequest(superuser);
execCommand(new RegisterDvObjectCommand(r, df));
countSuccesses++;
if (countSuccesses % 100 == 0) {
logger.info(countSuccesses + " out of " + count + " files registered successfully. " + new Date());
}
} else {
countDrafts++;
logger.fine(countDrafts + " out of " + count + " files not yet published");
}
} else {
countAlreadyRegistered++;
logger.fine(countAlreadyRegistered + " out of " + count + " files are already registered. " + new Date());
}
} catch (WrappedResponse ex) {
countReleased++;
logger.info("Failed to register file id: " + df.getId());
Logger.getLogger(Datasets.class.getName()).log(Level.SEVERE, null, ex);
} catch (Exception e) {
logger.info("Unexpected Exception: " + e.getMessage());
}

try {
Thread.sleep(sleepInterval * 1000);
} catch (InterruptedException ie) {
logger.warning("Interrupted Exception when attempting to execute Thread.sleep()!");
}
}

logger.info(countAlreadyRegistered + " out of " + count + " files were already registered. " + new Date());
logger.info(countDrafts + " out of " + count + " files are not yet published. " + new Date());
logger.info(countReleased + " out of " + count + " unregistered, published files to register. " + new Date());
logger.info(countSuccesses + " out of " + countReleased + " unregistered, published files registered successfully. "
+ new Date());

return ok("Datafile registration complete. " + countSuccesses + " out of " + countReleased
+ " unregistered, published files registered successfully.");
}

@GET
@AuthRequired
Expand Down
67 changes: 66 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@

import edu.harvard.iq.dataverse.util.json.JSONLDUtil;
import edu.harvard.iq.dataverse.util.json.JsonParseException;
import edu.harvard.iq.dataverse.util.json.JsonPrinter;
import static edu.harvard.iq.dataverse.util.json.JsonPrinter.brief;
import java.io.StringReader;
import java.util.Collections;
Expand Down Expand Up @@ -129,6 +130,7 @@
import java.util.Optional;
import java.util.stream.Collectors;
import javax.servlet.http.HttpServletResponse;
import javax.validation.constraints.NotNull;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.StreamingOutput;
Expand Down Expand Up @@ -166,7 +168,7 @@ public class Dataverses extends AbstractApiBean {

@EJB
SwordServiceBean swordService;

@POST
@AuthRequired
public Response addRoot(@Context ContainerRequestContext crc, String body) {
Expand Down Expand Up @@ -590,6 +592,69 @@ public Response deleteDataverse(@Context ContainerRequestContext crc, @PathParam
}, getRequestUser(crc));
}

/**
* Endpoint to change attributes of a Dataverse collection.
*
* @apiNote Example curl command:
* <code>curl -X PUT -d "test" http://localhost:8080/api/dataverses/$ALIAS/attribute/alias</code>
* to change the alias of the collection named $ALIAS to "test".
*/
@PUT
@AuthRequired
@Path("{identifier}/attribute/{attribute}")
public Response updateAttribute(@Context ContainerRequestContext crc, @PathParam("identifier") String identifier,
@PathParam("attribute") String attribute, @QueryParam("value") String value) {
try {
Dataverse collection = findDataverseOrDie(identifier);
User user = getRequestUser(crc);
DataverseRequest dvRequest = createDataverseRequest(user);

// TODO: The cases below use hard coded strings, because we have no place for definitions of those!
// They are taken from util.json.JsonParser / util.json.JsonPrinter. This shall be changed.
// This also should be extended to more attributes, like the type, theme, contacts, some booleans, etc.
switch (attribute) {
case "alias":
collection.setAlias(value);
break;
case "name":
collection.setName(value);
break;
case "description":
collection.setDescription(value);
break;
case "affiliation":
collection.setAffiliation(value);
break;
/* commenting out the code from the draft pr #9462:
case "versionPidsConduct":
CollectionConduct conduct = CollectionConduct.findBy(value);
if (conduct == null) {
return badRequest("'" + value + "' is not one of [" +
String.join(",", CollectionConduct.asList()) + "]");
}
collection.setDatasetVersionPidConduct(conduct);
break;
*/
case "filePIDsEnabled":
collection.setFilePIDsEnabled(parseBooleanOrDie(value));
break;
default:
return badRequest("'" + attribute + "' is not a supported attribute");
}

// Off to persistence layer
execCommand(new UpdateDataverseCommand(collection, null, null, dvRequest, null));

// Also return modified collection to user
return ok("Update successful", JsonPrinter.json(collection));

// TODO: This is an anti-pattern, necessary due to this bean being an EJB, causing very noisy and unnecessary
// logging by the EJB container for bubbling exceptions. (It would be handled by the error handlers.)
} catch (WrappedResponse e) {
return e.getResponse();
}
}

@DELETE
@AuthRequired
@Path("{linkingDataverseId}/deleteLink/{linkedDataverseId}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -645,7 +645,7 @@ private boolean runAddReplacePhase1(Dataset owner,
df.setRootDataFileId(fileToReplace.getRootDataFileId());
}
// Reuse any file PID during a replace operation (if File PIDs are in use)
if (systemConfig.isFilePIDsEnabled()) {
if (systemConfig.isFilePIDsEnabledForCollection(owner.getOwner())) {
df.setGlobalId(fileToReplace.getGlobalId());
df.setGlobalIdCreateTime(fileToReplace.getGlobalIdCreateTime());
// Should be true or fileToReplace wouldn't have an identifier (since it's not
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,7 @@ private void publicizeExternalIdentifier(Dataset dataset, CommandContext ctxt) t
String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, "");
String currentGlobalAuthority = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Authority, "");
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabled();
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabledForCollection(getDataset().getOwner());
// We will skip trying to register the global identifiers for datafiles
// if "dependent" file-level identifiers are requested, AND the naming
// protocol, or the authority of the dataset global id is different from
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ public PublishDatasetResult execute(CommandContext ctxt) throws CommandException
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean registerGlobalIdsForFiles =
(currentGlobalIdProtocol.equals(theDataset.getProtocol()) || dataFilePIDFormat.equals("INDEPENDENT"))
&& ctxt.systemConfig().isFilePIDsEnabled();
&& ctxt.systemConfig().isFilePIDsEnabledForCollection(theDataset.getOwner());

if ( registerGlobalIdsForFiles ){
registerGlobalIdsForFiles = currentGlobalAuthority.equals( theDataset.getAuthority() );
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ protected void executeImpl(CommandContext ctxt) throws CommandException {
// didn't need updating.
String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, "");
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabled();
boolean isFilePIDsEnabled = ctxt.systemConfig().isFilePIDsEnabledForCollection(target.getOwner());
// We will skip trying to update the global identifiers for datafiles if they
// aren't being used.
// If they are, we need to assure that there's an existing PID or, as when
Expand Down
Loading

0 comments on commit c8a1352

Please sign in to comment.