Skip to content

Commit

Permalink
Merge pull request #10694 from IQSS/10517-dataset-types
Browse files Browse the repository at this point in the history
dataset types (software, workflow, etc.) - initial support
  • Loading branch information
sekmiller authored Sep 6, 2024
2 parents 486dd55 + b7b9b7d commit 4143031
Show file tree
Hide file tree
Showing 41 changed files with 2,415 additions and 1,225 deletions.
2 changes: 2 additions & 0 deletions conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@
<field name="entityId" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="datasetVersionId" type="plong" stored="true" indexed="true" multiValued="false"/>
<field name="datasetType" type="string" stored="true" indexed="true" multiValued="false"/>

<!-- Added for Dataverse 4.0 alpha 1 to sort by name -->
<!-- https://redmine.hmdc.harvard.edu/issues/3482 -->
Expand Down Expand Up @@ -426,6 +427,7 @@
<copyField source="dvAlias" dest="_text_" maxChars="3000"/>
<copyField source="dvAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="dsPersistentId" dest="_text_" maxChars="3000"/>
<copyField source="datasetType" dest="_text_" maxChars="3000"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
Expand Down
10 changes: 10 additions & 0 deletions doc/release-notes/10517-datasetType.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Initial Support for Dataset Types

Out of the box, all datasets have the type "dataset" but superusers can add additional types. At this time the type can only be set at creation time via API. The types "dataset", "software", and "workflow" will be sent to DataCite when the dataset is published.

For details see <https://dataverse-guide--10694.org.readthedocs.build/en/10694/user/dataset-management.html#dataset-types> and #10517. Please note that this feature is highly experimental and is expected to evolve.

Upgrade instructions
--------------------

Update your Solr schema.xml file to pick up the "datasetType" additions and do a full reindex.
82 changes: 82 additions & 0 deletions doc/sphinx-guides/source/_static/api/dataset-create-software.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"datasetType": "software",
"datasetVersion": {
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
},
"metadataBlocks": {
"citation": {
"fields": [
{
"value": "pyDataverse",
"typeClass": "primitive",
"multiple": false,
"typeName": "title"
},
{
"value": [
{
"authorName": {
"value": "Range, Jan",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorName"
},
"authorAffiliation": {
"value": "University of Stuttgart",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorAffiliation"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "author"
},
{
"value": [
{ "datasetContactEmail" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactEmail",
"value" : "jan@mailinator.com"
},
"datasetContactName" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactName",
"value": "Range, Jan"
}
}],
"typeClass": "compound",
"multiple": true,
"typeName": "datasetContact"
},
{
"value": [ {
"dsDescriptionValue":{
"value": "A Python module for Dataverse.",
"multiple":false,
"typeClass": "primitive",
"typeName": "dsDescriptionValue"
}}],
"typeClass": "compound",
"multiple": true,
"typeName": "dsDescription"
},
{
"value": [
"Computer and Information Science"
],
"typeClass": "controlledVocabulary",
"multiple": true,
"typeName": "subject"
}
],
"displayName": "Citation Metadata"
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"http://purl.org/dc/terms/title": "Darwin's Finches",
"http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences",
"http://purl.org/dc/terms/creator": {
"https://dataverse.org/schema/citation/authorName": "Finch, Fiona",
"https://dataverse.org/schema/citation/authorAffiliation": "Birds Inc."
},
"https://dataverse.org/schema/citation/datasetContact": {
"https://dataverse.org/schema/citation/datasetContactEmail": "finch@mailinator.com",
"https://dataverse.org/schema/citation/datasetContactName": "Finch, Fiona"
},
"https://dataverse.org/schema/citation/dsDescription": {
"https://dataverse.org/schema/citation/dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
},
"https://dataverse.org/schema/core#datasetType": "software"
}
166 changes: 166 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -744,6 +744,8 @@ To create a dataset, you must supply a JSON file that contains at least the foll
- Description Text
- Subject

.. _api-create-dataset-incomplete:

Submit Incomplete Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -801,6 +803,8 @@ The following is an example HTTP call with deactivated validation:
**Note:** You may learn about an instance's support for deposition of incomplete datasets via :ref:`info-incomplete-metadata`.

.. _api-create-dataset:

Submit Dataset
^^^^^^^^^^^^^^

Expand Down Expand Up @@ -830,6 +834,19 @@ You should expect an HTTP 200 ("OK") response and JSON indicating the database I

.. note:: Only a Dataverse installation account with superuser permissions is allowed to include files when creating a dataset via this API. Adding files this way only adds their file metadata to the database, you will need to manually add the physical files to the file system.

.. _api-create-dataset-with-type:

Create a Dataset with a Dataset Type (Software, etc.)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

Follow :ref:`api-create-dataset` as normal but include a line like ``"datasetType": "software"`` in your JSON. You can check which types are supported by your installation using the :ref:`api-list-dataset-types` API endpoint.

Here is an example JSON file for reference: :download:`dataset-create-software.json <../_static/api/dataset-create-software.json>`.

See also :ref:`dataset-types`.

.. _api-import-dataset:

Import a Dataset into a Dataverse Collection
Expand Down Expand Up @@ -872,6 +889,18 @@ Before calling the API, make sure the data files referenced by the ``POST``\ ed
* This API endpoint does not support importing *files'* persistent identifiers.
* A Dataverse installation can import datasets with a valid PID that uses a different protocol or authority than said server is configured for. However, the server will not update the PID metadata on subsequent update and publish actions.

.. _import-dataset-with-type:

Import a Dataset with a Dataset Type (Software, etc.)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

The same native JSON file as above under :ref:`api-create-dataset-with-type` can be used when importing a dataset.

A file like this is the only difference. Otherwise, follow :ref:`api-import-dataset` as normal.

See also :ref:`dataset-types`.

Import a Dataset into a Dataverse Installation with a DDI file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -3039,6 +3068,98 @@ The API can also be used to reset the dataset to use the default/inherited value
The default will always be the same provider as for the dataset PID if that provider can generate new PIDs, and will be the PID Provider set for the collection or the global default otherwise.
.. _api-dataset-types:
Dataset Types
~~~~~~~~~~~~~
See :ref:`dataset-types` in the User Guide for an overview of the feature.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
.. _api-list-dataset-types:
List Dataset Types
^^^^^^^^^^^^^^^^^^
Show which dataset types are available.
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/datasets/datasetTypes"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/datasetTypes"
.. _api-list-dataset-type:
Get Dataset Type
^^^^^^^^^^^^^^^^
Show a dataset type by passing either its database id (e.g. "2") or its name (e.g. "software").
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export TYPE=software
curl $SERVER_URL/api/datasets/datasetTypes/$TYPE"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/datasetTypes/software"
.. _api-add-dataset-type:
Add Dataset Type
^^^^^^^^^^^^^^^^
Note: Before you add any types of your own, there should be a single type called "dataset". If you add "software" or "workflow", these types will be sent to DataCite (if you use DataCite). Otherwise, the only functionality you gain currently from adding types is an entry in the "Dataset Type" facet but be advised that if you add a type other than "software" or "workflow", you will need to add your new type to your Bundle.properties file for it to appear in Title Case rather than lower case in the "Dataset Type" facet.
With all that said, we'll add a "software" type in the example below. This API endpoint is superuser only. The "name" of a type cannot be only digits.
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export JSON='{"name": "software"}'
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-Type: application/json" "$SERVER_URL/api/datasets/datasetTypes" -X POST -d $JSON
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -H "Content-Type: application/json" "https://demo.dataverse.org/api/datasets/datasetTypes" -X POST -d '{"name": "software"}'
.. _api-delete-dataset-type:
Delete Dataset Type
^^^^^^^^^^^^^^^^^^^
Superuser only.
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export TYPE_ID=3
curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/datasets/datasetTypes/$TYPE_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/datasetTypes/3"
Files
-----
Expand Down Expand Up @@ -5295,6 +5416,51 @@ Delete Database Setting
Delete the setting under ``name``::
DELETE http://$SERVER/api/admin/settings/$name
.. _list-all-feature-flags:
List All Feature Flags
~~~~~~~~~~~~~~~~~~~~~~
Experimental and preview features are sometimes hidden behind feature flags. See :ref:`feature-flags` in the Installation Guide for a list of flags and how to configure them.
This API endpoint provides a list of feature flags and "enabled" or "disabled" for each one.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
.. code-block:: bash
export SERVER_URL=http://localhost:8080
curl "$SERVER_URL/api/admin/featureFlags"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "http://localhost:8080/api/admin/featureFlags"
.. _show-feature-flag-status:
Show Feature Flag Status
~~~~~~~~~~~~~~~~~~~~~~~~
This endpoint reports "enabled" as true for false for a single feature flag. (For all flags, see :ref:`list-all-feature-flags`.)
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
.. code-block:: bash
export SERVER_URL=http://localhost:8080
export FLAG=DATASET_TYPES
curl "$SERVER_URL/api/admin/featureFlags/$FLAG"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl "http://localhost:8080/api/admin/featureFlags/DATASET_TYPES"
Manage Banner Messages
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Note, this example uses the term URI directly rather than adding an ``@context``

You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated.

.. _api-semantic-create-dataset:

Create a Dataset
----------------
Expand All @@ -105,4 +106,16 @@ With curl, this is done by adding the following header:
curl -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets --upload-file dataset-create.jsonld
An example jsonld file is available at :download:`dataset-create.jsonld <../_static/api/dataset-create.jsonld>` (:download:`dataset-create_en.jsonld <../_static/api/dataset-create.jsonld>` is a version that sets the metadata language (see :ref:`:MetadataLanguages`) to English (en).)


.. _api-semantic-create-dataset-with-type:

Create a Dataset with a Dataset Type
------------------------------------

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

An example JSON-LD file is available at :download:`dataset-create-software.jsonld <../_static/api/dataset-create-software.jsonld>`.

You can use this file with the normal :ref:`api-semantic-create-dataset` endpoint above.

See also :ref:`dataset-types`.
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3302,10 +3302,11 @@ please find all known feature flags below. Any of these flags can be activated u
- Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call.
- ``Off``


**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development </container/dev-usage>`, you can set them in the `docker compose <https://docs.docker.com/compose/environment-variables/set-environment-variables/>`_ file.

To check the status of feature flags via API, see :ref:`list-all-feature-flags` in the API Guide.

.. _:ApplicationServerSettings:

Application Server Settings
Expand Down
Loading

0 comments on commit 4143031

Please sign in to comment.