Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset types (software, workflow, etc.) - initial support #10694

Merged
merged 56 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
830ea35
stub out dataset types #10517
pdurbin Jul 15, 2024
b3c2dce
persist "software" etc in new datasetType entity #10517
pdurbin Jul 15, 2024
9541d41
set datasetType using Semantic API #10517
pdurbin Jul 16, 2024
4f055b6
assert that importing software via JSON works #10517
pdurbin Jul 17, 2024
47c5b30
fix typo #10517
pdurbin Jul 17, 2024
cc68c7d
allow dataset type to be specified in DDI import #10517
pdurbin Jul 22, 2024
25b2ea5
list import with native json #10517
pdurbin Jul 22, 2024
2b83f22
make dataset type searchable and facetable #10517
pdurbin Jul 23, 2024
cfac9dc
improve sample data to look more like software or a workflow #10517
pdurbin Jul 23, 2024
3aab5c0
stop supporting setting of dataset type via DDI #10517
pdurbin Jul 24, 2024
c8adf25
remove enum and put dataset types in database instead #10517
pdurbin Jul 26, 2024
b5b8c40
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Jul 26, 2024
bf668a9
fix sql script, add column etc #10517
pdurbin Jul 29, 2024
57b410f
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Jul 29, 2024
067d416
make dataset types translatable and capitalized consisently #10517
pdurbin Jul 29, 2024
d747959
rename datasetType_s to datasetType #10517
pdurbin Jul 29, 2024
78a3c1a
fix sql script when adding foreign key constraint #10517
pdurbin Jul 29, 2024
8593d32
send to DataCite either Dataset, Software, or Workflow #10517
pdurbin Jul 29, 2024
771d85a
expose if feature flags are enabled or disabled via API #10732
pdurbin Jul 30, 2024
c69c3ae
improve error handling for dataset types #10517
pdurbin Jul 30, 2024
b47e62b
add docs for dataset types #10517
pdurbin Jul 31, 2024
680fe05
remove developer notes #10517
pdurbin Jul 31, 2024
b10301f
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Jul 31, 2024
2e81187
improve error handling when adding a dataset type #10517
pdurbin Jul 31, 2024
6c4b847
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 5, 2024
dd7541f
move check to JsonParser #10517
pdurbin Aug 5, 2024
f2e1fa0
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 6, 2024
d781833
remove "dataset-types" feature flag #10517
pdurbin Aug 8, 2024
617e218
hide "Dataset Type" facet if all one type #10517
pdurbin Aug 12, 2024
1beed5d
remove debug line #10517
pdurbin Aug 12, 2024
200a45a
don't add "software" or "workflow" with Flyway #10517
pdurbin Aug 12, 2024
f8e8c4f
remove deprecation notice from default constructor #10517
pdurbin Aug 13, 2024
867f548
support id or name for GET of dataset type #10517
pdurbin Aug 13, 2024
2fb4fa6
stop logging to actionlogrecord for dataset types #10517
pdurbin Aug 13, 2024
eb20155
prevent default dataset type from being deleted #10517
pdurbin Aug 13, 2024
1a834de
get rid of unneeded null checks #10517
pdurbin Aug 13, 2024
19e3e51
use proper JSON-LD for dataset type #10517
pdurbin Aug 13, 2024
67e9971
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 13, 2024
6be46c6
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 20, 2024
faace91
add copyField for datasetType #10517
pdurbin Aug 22, 2024
260ac3a
make Solr schema.xml instructions more generic #10517
pdurbin Aug 22, 2024
647121a
whoops, should have been removed as part of 200a45a #10517
pdurbin Aug 22, 2024
d865521
reformat to replace tabs with spaces, etc #10517
pdurbin Aug 22, 2024
9c44b30
simplify logic #10517
pdurbin Aug 22, 2024
987df46
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 22, 2024
68e4a60
remove unused import #10517
pdurbin Aug 22, 2024
eabe8e2
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Aug 26, 2024
04f1c7c
name of dataset type cannot be only digits #10517
pdurbin Aug 26, 2024
42ff504
#10517 fix typo
sekmiller Aug 29, 2024
1bbf3b9
Merge branch 'develop' into 10517-dataset-types
qqmyers Aug 29, 2024
4842d92
remove unused imports #10517
pdurbin Sep 4, 2024
673d775
add test to assert capitalizataion of Dataset and Software #10517
pdurbin Sep 4, 2024
128c230
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Sep 4, 2024
78ca1a5
bump sql script version #10517
pdurbin Sep 4, 2024
373005a
Merge branch 'develop' into 10517-dataset-types #10517
pdurbin Sep 4, 2024
b7b9b7d
conditional INSERT of dataset type #10517
pdurbin Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@
<field name="entityId" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="datasetVersionId" type="plong" stored="true" indexed="true" multiValued="false"/>
<field name="datasetType" type="string" stored="true" indexed="true" multiValued="false"/>
pdurbin marked this conversation as resolved.
Show resolved Hide resolved

<!-- Added for Dataverse 4.0 alpha 1 to sort by name -->
<!-- https://redmine.hmdc.harvard.edu/issues/3482 -->
Expand Down Expand Up @@ -426,6 +427,7 @@
<copyField source="dvAlias" dest="_text_" maxChars="3000"/>
<copyField source="dvAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="dsPersistentId" dest="_text_" maxChars="3000"/>
<copyField source="datasetType" dest="_text_" maxChars="3000"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
Expand Down
10 changes: 10 additions & 0 deletions doc/release-notes/10517-datasetType.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Initial Support for Dataset Types

Out of the box, all datasets have the type "dataset" but superusers can add additional types. At this time the type can only be set at creation time via API. The types "dataset", "software", and "workflow" will be sent to DataCite when the dataset is published.

For details see <https://dataverse-guide--10694.org.readthedocs.build/en/10694/user/dataset-management.html#dataset-types> and #10517. Please note that this feature is highly experimental and is expected to evolve.

Upgrade instructions
--------------------

Update your Solr schema.xml file to pick up the "datasetType" additions and do a full reindex.
82 changes: 82 additions & 0 deletions doc/sphinx-guides/source/_static/api/dataset-create-software.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"datasetType": "software",
"datasetVersion": {
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
},
"metadataBlocks": {
"citation": {
"fields": [
{
"value": "pyDataverse",
"typeClass": "primitive",
"multiple": false,
"typeName": "title"
},
{
"value": [
{
"authorName": {
"value": "Range, Jan",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorName"
},
"authorAffiliation": {
"value": "University of Stuttgart",
"typeClass": "primitive",
"multiple": false,
"typeName": "authorAffiliation"
}
}
],
"typeClass": "compound",
"multiple": true,
"typeName": "author"
},
{
"value": [
{ "datasetContactEmail" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactEmail",
"value" : "jan@mailinator.com"
},
"datasetContactName" : {
"typeClass": "primitive",
"multiple": false,
"typeName": "datasetContactName",
"value": "Range, Jan"
}
}],
"typeClass": "compound",
"multiple": true,
"typeName": "datasetContact"
},
{
"value": [ {
"dsDescriptionValue":{
"value": "A Python module for Dataverse.",
"multiple":false,
"typeClass": "primitive",
"typeName": "dsDescriptionValue"
}}],
"typeClass": "compound",
"multiple": true,
"typeName": "dsDescription"
},
{
"value": [
"Computer and Information Science"
],
"typeClass": "controlledVocabulary",
"multiple": true,
"typeName": "subject"
}
],
"displayName": "Citation Metadata"
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"http://purl.org/dc/terms/title": "Darwin's Finches",
"http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences",
"http://purl.org/dc/terms/creator": {
"https://dataverse.org/schema/citation/authorName": "Finch, Fiona",
"https://dataverse.org/schema/citation/authorAffiliation": "Birds Inc."
},
"https://dataverse.org/schema/citation/datasetContact": {
"https://dataverse.org/schema/citation/datasetContactEmail": "finch@mailinator.com",
"https://dataverse.org/schema/citation/datasetContactName": "Finch, Fiona"
},
"https://dataverse.org/schema/citation/dsDescription": {
"https://dataverse.org/schema/citation/dsDescriptionValue": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds."
},
"https://dataverse.org/schema/core#datasetType": "software"
}
166 changes: 166 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -744,6 +744,8 @@ To create a dataset, you must supply a JSON file that contains at least the foll
- Description Text
- Subject

.. _api-create-dataset-incomplete:

Submit Incomplete Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -801,6 +803,8 @@ The following is an example HTTP call with deactivated validation:

**Note:** You may learn about an instance's support for deposition of incomplete datasets via :ref:`info-incomplete-metadata`.

.. _api-create-dataset:

Submit Dataset
^^^^^^^^^^^^^^

Expand Down Expand Up @@ -830,6 +834,19 @@ You should expect an HTTP 200 ("OK") response and JSON indicating the database I

.. note:: Only a Dataverse installation account with superuser permissions is allowed to include files when creating a dataset via this API. Adding files this way only adds their file metadata to the database, you will need to manually add the physical files to the file system.

.. _api-create-dataset-with-type:

Create a Dataset with a Dataset Type (Software, etc.)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

Follow :ref:`api-create-dataset` as normal but include a line like ``"datasetType": "software"`` in your JSON. You can check which types are supported by your installation using the :ref:`api-list-dataset-types` API endpoint.

Here is an example JSON file for reference: :download:`dataset-create-software.json <../_static/api/dataset-create-software.json>`.

See also :ref:`dataset-types`.

.. _api-import-dataset:

Import a Dataset into a Dataverse Collection
Expand Down Expand Up @@ -872,6 +889,18 @@ Before calling the API, make sure the data files referenced by the ``POST``\ ed
* This API endpoint does not support importing *files'* persistent identifiers.
* A Dataverse installation can import datasets with a valid PID that uses a different protocol or authority than said server is configured for. However, the server will not update the PID metadata on subsequent update and publish actions.

.. _import-dataset-with-type:

Import a Dataset with a Dataset Type (Software, etc.)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

The same native JSON file as above under :ref:`api-create-dataset-with-type` can be used when importing a dataset.

A file like this is the only difference. Otherwise, follow :ref:`api-import-dataset` as normal.

See also :ref:`dataset-types`.

Import a Dataset into a Dataverse Installation with a DDI file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -3039,6 +3068,98 @@ The API can also be used to reset the dataset to use the default/inherited value

The default will always be the same provider as for the dataset PID if that provider can generate new PIDs, and will be the PID Provider set for the collection or the global default otherwise.

.. _api-dataset-types:

Dataset Types
~~~~~~~~~~~~~

See :ref:`dataset-types` in the User Guide for an overview of the feature.

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.

.. _api-list-dataset-types:

List Dataset Types
^^^^^^^^^^^^^^^^^^

Show which dataset types are available.

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org

curl "$SERVER_URL/api/datasets/datasetTypes"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/datasetTypes"

.. _api-list-dataset-type:

Get Dataset Type
^^^^^^^^^^^^^^^^

Show a dataset type by passing either its database id (e.g. "2") or its name (e.g. "software").

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export TYPE=software

curl $SERVER_URL/api/datasets/datasetTypes/$TYPE"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/datasetTypes/software"

.. _api-add-dataset-type:

Add Dataset Type
^^^^^^^^^^^^^^^^

Note: Before you add any types of your own, there should be a single type called "dataset". If you add "software" or "workflow", these types will be sent to DataCite (if you use DataCite). Otherwise, the only functionality you gain currently from adding types is an entry in the "Dataset Type" facet but be advised that if you add a type other than "software" or "workflow", you will need to add your new type to your Bundle.properties file for it to appear in Title Case rather than lower case in the "Dataset Type" facet.

With all that said, we'll add a "software" type in the example below. This API endpoint is superuser only. The "name" of a type cannot be only digits.

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export JSON='{"name": "software"}'

curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-Type: application/json" "$SERVER_URL/api/datasets/datasetTypes" -X POST -d $JSON

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -H "Content-Type: application/json" "https://demo.dataverse.org/api/datasets/datasetTypes" -X POST -d '{"name": "software"}'

.. _api-delete-dataset-type:

Delete Dataset Type
^^^^^^^^^^^^^^^^^^^

Superuser only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that the default "dataset" type can't be deleted.


.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export TYPE_ID=3

curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/datasets/datasetTypes/$TYPE_ID"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/datasetTypes/3"

Files
-----

Expand Down Expand Up @@ -5295,6 +5416,51 @@ Delete Database Setting
Delete the setting under ``name``::

DELETE http://$SERVER/api/admin/settings/$name

.. _list-all-feature-flags:

List All Feature Flags
~~~~~~~~~~~~~~~~~~~~~~

Experimental and preview features are sometimes hidden behind feature flags. See :ref:`feature-flags` in the Installation Guide for a list of flags and how to configure them.

This API endpoint provides a list of feature flags and "enabled" or "disabled" for each one.

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash

export SERVER_URL=http://localhost:8080

curl "$SERVER_URL/api/admin/featureFlags"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "http://localhost:8080/api/admin/featureFlags"

.. _show-feature-flag-status:

Show Feature Flag Status
~~~~~~~~~~~~~~~~~~~~~~~~

This endpoint reports "enabled" as true for false for a single feature flag. (For all flags, see :ref:`list-all-feature-flags`.)

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash

export SERVER_URL=http://localhost:8080
export FLAG=DATASET_TYPES

curl "$SERVER_URL/api/admin/featureFlags/$FLAG"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "http://localhost:8080/api/admin/featureFlags/DATASET_TYPES"

Manage Banner Messages
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Note, this example uses the term URI directly rather than adding an ``@context``

You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated.

.. _api-semantic-create-dataset:

Create a Dataset
----------------
Expand All @@ -105,4 +106,16 @@ With curl, this is done by adding the following header:
curl -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets --upload-file dataset-create.jsonld

An example jsonld file is available at :download:`dataset-create.jsonld <../_static/api/dataset-create.jsonld>` (:download:`dataset-create_en.jsonld <../_static/api/dataset-create.jsonld>` is a version that sets the metadata language (see :ref:`:MetadataLanguages`) to English (en).)


.. _api-semantic-create-dataset-with-type:

Create a Dataset with a Dataset Type
------------------------------------

By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type.

An example JSON-LD file is available at :download:`dataset-create-software.jsonld <../_static/api/dataset-create-software.jsonld>`.

You can use this file with the normal :ref:`api-semantic-create-dataset` endpoint above.

See also :ref:`dataset-types`.
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3302,10 +3302,11 @@ please find all known feature flags below. Any of these flags can be activated u
- Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call.
- ``Off``


**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development </container/dev-usage>`, you can set them in the `docker compose <https://docs.docker.com/compose/environment-variables/set-environment-variables/>`_ file.

To check the status of feature flags via API, see :ref:`list-all-feature-flags` in the API Guide.

.. _:ApplicationServerSettings:

Application Server Settings
Expand Down
Loading
Loading