Skip to content

Releases: IQSS/dataverse

v5.4

05 Apr 16:29
ea91390
Compare
Choose a tag to compare

Dataverse Software 5.4

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. Please note that there is an API backwards compatibility issue in 5.4, and we recommend using 5.4.1 for any production environments.

Release Highlights

Deactivate Users API, Get User Traces API, Revoke Roles API

A new API has been added to deactivate users to prevent them from logging in, receiving communications, or otherwise being active in the system. Deactivating a user is an alternative to deleting a user, especially when the latter is not possible due to the amount of interaction the user has had with the Dataverse installation. In order to learn more about a user before deleting, deactivating, or merging, a new "get user traces" API is available that will show objects created, roles, group memberships, and more. Finally, the "remove all roles" button available in the superuser dashboard is now also available via API.

New File Access API

A new API offers crawlable access view of the folders and files within a dataset:

/api/datasets/<dataset id>/dirindex/

will output a simple html listing, based on the standard Apache directory index, with Access API download links for individual files, and recursive calls to the API above for sub-folders. Please see the Native API Guide for more information.

Using this API, wget --recursive (or similar crawling client) can be used to download all the files in a dataset, preserving the file names and folder structure; without having to use the download-as-zip API. In addition to being faster (zipping is a relatively resource-intensive operation on the server side), this process can be restarted if interrupted (with wget --continue or equivalent) - unlike zipped multi-file downloads that always have to start from the beginning.

On a system that uses S3 with download redirects, the individual file downloads will be handled by S3 directly (with the exception of tabular files), without having to be proxied through the Dataverse application.

Restricted Files and DDI "dataDscr" Information (Summary Statistics, Variable Names, Variable Labels)

In previous releases, DDI "dataDscr" information (summary statistics, variable names, and variable labels, sometimes known as "variable metadata") for tabular files that were ingested successfully were available even if files were restricted. This has been changed in the following ways:

  • At the dataset level, DDI exports no longer show "dataDscr" information for restricted files. There is only one version of this export and it is the version that's suitable for public consumption with the "dataDscr" information hidden for restricted files.
  • Similarly, at the dataset level, the DDI HTML Codebook no longer shows "dataDscr" information for restricted files.
  • At the file level, "dataDscr" information is no longer publicly available for restricted files. In practice, it was only possible to get this publicly via API (the download/access button was hidden).
  • At the file level, "dataDscr" (variable metadata) information can still be downloaded for restricted files if you have access to download the file.

Search with Accented Characters

Many languages include characters that have close analogs in ascii, e.g. (á, à, â, ç, é, è, ê, ë, í, ó, ö, ú, ù, û, ü…). This release changes the default Solr configuration to allow search to match words based on these associations, e.g. a search for Mercè would match the word Merce in a Dataset, and vice versa. This should generally be helpful, but can result in false positives, e.g. "canon" will be found searching for "cañon".

Java 11, PostgreSQL 13, and Solr 8 Support/Upgrades

Several of the core components of the Dataverse Software have been upgraded. Specifically:

  • The Dataverse Software now runs on and requires Java 11. This will provide performance and security enhancements, allows developers to take advantage of new and updated Java features, and moves the project to a platform with better longer term support. This upgrade requires a few extra steps in the release process, outlined below.
  • The Dataverse Software has now been tested with PostgreSQL versions up to 13. Versions 9.6+ will still work, but this update is necessary to support the software beyond PostgreSQL EOL later in 2021.
  • The Dataverse Software now runs on Solr 8.8.1, the latest available stable release in the Solr 8.x series.

Saved Search Performance Improvements

A refactoring has greatly improved Saved Search performance in the application. If your installation has multiple, potentially long-running Saved Searches in place, this greatly improves the probability that those search jobs will complete without timing out.

Worldmap/Geoconnect Integration Now Obsolete

As of this release, the Geoconnect/Worldmap integration is no longer available. The Harvard University Worldmap is going through a migration process, and instead of updating this code to work with the new infrastructure, the decision was made to pursue future Geospatial exploration/analysis through other tools, following the External Tools Framework in the Dataverse Software.

Guides Updates

The Dataverse Software Guides have been updated to follow recent changes to how different terms are used across the Dataverse Project. For more information, see Mercè's note to the community:

https://groups.google.com/g/dataverse-community/c/pD-aFrpXMPo

Conditionally Required Metadata Fields

Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.

In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.

Major Use Cases

Newly-supported major use cases in this release include:

  • Dataverse Installation Administrators can now deactivate users using a new API. (Issue #2419, PR #7629)
  • Superusers can remove all of a user's assigned roles using a new API. (Issue #2419, PR #7629)
  • Superusers can use an API to gather more information about actions a user has taken in the system in order to make an informed decisions about whether or not to deactivate or delete a user. (Issue #2419, PR #7629)
  • Superusers will now be able to harvest from installations using ISO-639-3 language codes. (Issue #7638, PR #7690)
  • Users interacting with the workflow system will receive status messages (Issue #7564, PR #7635)
  • Users interacting with prepublication workflows will see speed improvements (Issue #7681, PR #7682)
  • API Users will receive Dataverse collection API responses in a deterministic order. (Issue #7634, PR #7708)
  • API Users will be able to access a list of crawlable URLs for file download, allowing for faster and easily resumable transfers. (Issue #7084, PR #7579)
  • Users will no longer be able to access summary stats for restricted files. (Issue #7619, PR #7642)
  • Users will now see truncated versions of long strings (primarily checksums) throughout the application (Issue #6685, PR #7312)
  • Users will now be able to easily copy checksums, API tokens, and private URLs with a single click (Issue #6039, Issue #6685, PR #7539, PR #7312)
  • Users uploading data through the Direct Upload API will now be able to use additional checksums (Issue #7600, PR #7602)
  • Users searching for content will now be able to search using non-ascii characters. (Issue #820, PR #7378)
  • Users can now replace files in draft datasets, a functionality previously only available on published datasets. (Issue #7149, PR #7337)
  • Dataverse Installation Administrators can now set subfields of compound fields as conditionally required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name. (Issue #7606, PR #7608)

Notes for Dataverse Installation Administrators

Java 11 Upgrade

There are some things to note and keep in mind regarding the move to Java 11:

  • You should install the JDK/JRE following your usual methods, depending on your operating system. An example of this on a RHEL/CentOS 7 or RHEL/CentOS 8 system is:

    $ sudo yum remove java-1.8.0-openjdk java-1.8.0-openjdk-devel java-1.8.0-openjdk-headless

    $ sudo yum install java-11-openjdk-devel

    The remove command may provide an error message if -headless isn't installed.

  • We targeted and tested Java 11, but 11+ will likely work. Java 11 was targeted because of its long term support.

  • If you're moving from a Dataverse installation that was previously running Glassfish 4.x (typically this would be Dataverse Software 4.x), you will need to adjust some JVM options in domain.xml as part of the upgrade process. We've provided these optional steps below. These steps are not required if your first installed Dataverse Software version was running Payara 5.x (typically Dataverse Software 5.x).

PostgreSQL Versions Up To 13 Supported

Up until this release our installation guide "strongly recommended" to install PostgreSQL v. 9.6. While tha...

Read more

v5.3

10 Dec 22:26
fcb5ce7
Compare
Choose a tag to compare

Dataverse 5.3

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Auxiliary Files (Experimental)

Auxiliary files can now be added to datafiles and accessed using new experimental API endpoints. These endpoints allow additional, non-Dataverse generated metadata to be added alongside datafiles in dataverse.

The support for auxiliary files in Dataverse is being driven by integration with the Open Differential Privacy (DP) Project and is designed to support the deposit and retrieval of differentially private metadata, but the endpoints are not specific to differential privacy use cases.

Additional Banner Functionality

Banners in Dataverse can now be set to allow dismissal by a logged in user. Previously, banners would persist until they were removed by an administrator. This allows administrators to more easily communicate one-time messages to users.

File Tags Searchable from Advanced Search and Dataset Search

File tags ("Documentation", "Data", "Code", etc.) now appear on the Advanced Search page.

Performing a search for files on the dataset page now includes file tags. Previously, only file name and file description were searched.

Easier Configuration of Database Connections

Previously, the configuration of the database connections has been quite static and not very easy to update. This has been an issue especially for cloud and container usage. Using new technologies provided by the move to Payara, you can now more easily configure the connection to your PostgreSQL DB.

Using MicroProfile Config API (Issue #7000, Issue #7418), you can much more easily specify configuration
details. For an overview of supported options, please see the
Installation Guide.

Note that some settings have been moved from domain.xml to code, such as min and max pool size.

Major Use Cases

Newly-supported use cases in this release include:

  • Users can use an API to add auxiliary files to files in order to provide metadata representations for specific tools or integrations (Issue #7275, PR #7350)
  • Administrators can use a new API to manage banner messages and take advantage of new banner display options (Issue #7263, PR #7434)
  • Users replacing files will now have their files renamed when a file name conflict exists, making the behavior consistent with upload and edit (Issue #7335, PR #7336)
  • Users will now be able to search on file tags on the advanced search and dataset pages (Issue #7194, PR #7385)

Notes for Dataverse Installation Administrators

Payara 5.2020.6 (or Higher) Required

Some changes in this release require an upgrade to Payara 5.2020.6 or higher.

Instructions on how to update can be found in the
Payara documentation

New Banner API, Obsolete DB Settings

The functionality previously provided by the DB settings :StatusMessageHeader and ::StatusMessageText is no longer supported and is now provided through the Manage Banner Messages API. Learn more in the API Guide.

New Database Settings and JVM Options

Several new JVM options have been added in this release:

  • dataverse.db.name
  • dataverse.db.user
  • dataverse.db.password
  • dataverse.db.host
  • dataverse.db.port

For an overview of these new options, please see the
Installation Guide

See above note about obsolete DB options.

Introducing MicroProfile Config API

With this Dataverse release, Dataverse Administrators can start to make use of the MicroProfile Config API.

This will benefit both developers and sysadmins, but the codebase will have to be refactored to make use of it. As this will take time, we will always provide a backward compatible way of using it.

For more details about these new options, please see the Consuming Configuration section of the Developer Guide.

Java Message System Configuration

The Ingest process uses the Java Message System to create ingest tasks in a queue. That queue had been configured from command line or domain.xml before. This has now changed to being done
in code.

In the unlikely case you might want to change any of these settings, feel free to change and recompile or raise an issue on Github. See IngestQueueProducer for more details.

If you want to clean up your existing installation, you can delete the old, unused queue like this:

  • <payara install path>/bin/asadmin delete-connector-connection-pool --cascade=true jms IngestQueueConnectionFactoryPool

Notes for Tool Developers and Integrators

Experimental Auxiliary File Support

Experimental endpoints have been added to allow auxiliary files to be added to datafiles. These auxiliary files can be deposited and accessed via API. Later releases will include options for accessing these files through the UI. For more information, see the Auxiliary File Support section of the Developer Guide.

Complete List of Changes

For the complete list of code changes in this release, see the 5.3 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

1. Upgrade to Payara 5.2020.6 or higher.

Instructions on how to update can be found in the
Payara documentation.

It would likely be safer to upgrade Payara first, while still running Dataverse 5.2, and then proceed with the steps below. Upgrading from an earlier version of Payara should be a straightforward process: Undeploy Dataverse; stop Payara; move the current Payara directory out of the way; unzip the new Payara version in its place; replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1; start Payara, deploy Dataverse 5.2. We still recommend that you read the detailed upgrade instructions above; and, if you run into any issues with this upgrade, it will help to be able to separate them from any problems with the upgrade of Dataverse proper.
If you are still using pre-5.0 version of Dataverse, and Glassfish version 4, please follow the upgrade instructions in the Dataverse 5.0 release notes; but use the latest version of Payara 5 (5.2020.7, as of this writing).

2. Undeploy the previous version.

  • <payara install path>/bin/asadmin list-applications
  • <payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

3. Update your database connection.

Please configure your connection details, replacing all the ${DB_...}.

  • <payara install path>/bin/asadmin create-system-properties "dataverse.db.user=${DB_USER}"

  • <payara install path>/bin/asadmin create-system-properties "dataverse.db.host=${DB_HOST}"

  • <payara install path>/bin/asadmin create-system-properties "dataverse.db.port=${DB_PORT}"

  • <payara install path>/bin/asadmin create-system-properties "dataverse.db.name=${DB_NAME}"

  • echo "AS_ADMIN_ALIASPASSWORD=${DB_PASS}" > /tmp/password.txt

  • <payara install path>/bin/asadmin create-password-alias --passwordfile /tmp/password.txt dataverse.db.password

  • rm /tmp/password.txt

4. In domain.xml, verify that the __TimerPool jdbc-connection-pool is using the H2 database, as follows (if you have the old Derby version from Glassfish 4, replace it):

<jdbc-connection-pool datasource-classname="org.h2.jdbcx.JdbcDataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="URL" value="jdbc:h2:${com.sun.aas.instanceRoot}/lib/databases/ejbtimer;AUTO_SERVER=TRUE"></property> </jdbc-connection-pool>

5. Reset the EJB timer database back to default:

  • <payara install path>/bin/asadmin set configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/__TimerPool

6. Delete the old password alias and DB pool:

  • <payara install path>/bin/asadmin delete-jdbc-connection-pool --cascade=true dvnDbPool
  • <payara install path>/bin/asadmin delete-password-alias db_password_alias

7. Stop payara, remove the generated and ejbtimer database directories, then restart.

  • service payara stop
  • rm -rf <payara install path>/glassfish/domains/domain1/generated
  • rm -rf <payara install path>/glassfish/domains/domain1/lib/databases/ejbtimer
  • service payara start

8. Deploy this version.

  • <payara install path>/bin/asadmin deploy dataverse-5.3.war

9. Restart payara

  • service payara stop
  • service payara start

v5.2

09 Nov 21:31
4951505
Compare
Choose a tag to compare

Dataverse 5.2

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Preview When Guestbooks or Terms Exist

Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).

Preview Only External Tools

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).

Dataset Page Edit Options Consolidation

As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.

Google Cloud Archiver

Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.

Major Use Cases

Newly-supported use cases in this release include:

  • Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
  • External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
  • Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
  • Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
  • External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)

Notes for Dataverse Installation Administrators

Converting Explore External Tools to Preview Only

When the war file is deployed, a SQL migration script will convert dataverse-previewers to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.

If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.

New Database Settings and JVM Options

Installations integrating with Google Cloud Archiver will need to use two new database settings:

  • :GoogleCloudProject - the name of the project managing the bucket
  • :GoogleCloudBucket - the name of the bucket to use

For more information, see the Google Cloud Configuration section of the Installation Guide

Automation of Make Data Count Scripts

Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the Admin Guide.

Notes for Tool Developers and Integrators

Preview Only External Tools, "hasPreviewMode"

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.

Multiple Types for External Tools

External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.

User Information Endpoint

New API endpoint to retrieve user info so that tools can email users if needed.

Complete List of Changes

For the complete list of code changes in this release, see the 5.2 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

1. Undeploy the previous version.

  • <payara install path>/bin/asadmin list-applications
  • <payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

2. Stop payara and remove the generated directory, start.

  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/payara/domains/domain1/generated
  • service payara start

3. Deploy this version.

  • <payara install path>/bin/asadmin deploy dataverse-5.2.war

4. Restart payara

  • service payara stop
  • service payara start

v5.1.1

08 Oct 18:02
559a449
Compare
Choose a tag to compare

Dataverse 5.1.1

This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.

Release Highlights

Connection Pool Size Configuration Option, Connection Optimizations

Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.

Major Use Cases

Newly-supported use cases in this release include:

  • Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)

Notes for Dataverse Installation Administrators

5.1.1 vs. 5.1 for Production Use

As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.

New JVM Option for Connection Pool Size

Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:

./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"
(where <id> is the identifier of your S3 file store (likely "s3"). The JVM Options section of the Configuration Guide has more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

  1. These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the Dataverse 5.1 Release Notes.

  2. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

  1. Stop payara and remove the generated directory, start.
  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/glassfish/domains/domain1/generated
  • service payara start
  1. Deploy this version.
    <payara install path>/bin/asadmin deploy dataverse-5.1.1.war

  2. Restart payara

Dataverse 5.1

06 Oct 15:55
7a0eef0
Compare
Choose a tag to compare

Dataverse 5.1

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Large File Upload for Installations Using AWS S3

The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.

Dataset-Specific Stores

In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.

Major Use Cases

Newly-supported use cases in this release include:

  • Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
  • Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
  • Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
  • Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue #80, PR #7276)
  • Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
  • Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)

Notes for Dataverse Installation Administrators

New API for setting a Dataset-level Store

  • This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the Admin Guide.

Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload

Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the Developers Guide.

While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).

New APIs for keeping Solr records in sync

This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the Admin Guide.

Documentation for Purging the Ingest Queue

At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the Admin Guide now has specific steps.

Biomedical Metadata Block Updated

The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.

Notes for Tool Developers and Integrators

Spaces in File Names

Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

  1. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

  2. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

  1. Stop payara and remove the generated directory, start.
  • service payara stop
  • remove the generated directory:
    rm -rf <payara install path>/payara/domains/domain1/generated
  • service payara start
  1. Deploy this version.
    <payara install path>/bin/asadmin deploy dataverse-5.1.war

  2. Restart payara

Additional Upgrade Steps

  1. Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll

    wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv
    curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"

  2. Check if your Solr installation is running with the latest schema.xml config file (https://github.com/IQSS/dataverse/releases/download/v5.1/schema.xml), update if needed.

  3. Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration.
    For example: (modify the path names as needed)
    cd /usr/local/solr-7.7.2/server/solr/collection1/conf
    wget https://github.com/IQSS/dataverse/releases/download/v5.1/updateSchemaMDB.sh
    chmod +x updateSchemaMDB.sh
    ./updateSchemaMDB.sh -t .
    See http://guides.dataverse.org/en/5.1/admin/metadatacustomization.html?highlight=updateschemamdb for more information.

  4. Run ReExportall to update JSON Exports
    http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

Dataverse 5.0

18 Aug 21:34
993d0a3
Compare
Choose a tag to compare

Dataverse 5.0

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that this is a major release and these are long release notes. We offer no apologies. :)

Release Highlights

Continued Dataset and File Redesign: Dataset and File Button Redesign, Responsive Layout

The buttons available on the Dataset and File pages have been redesigned. This change is to provide more scalability for future expanded options for data access and exploration, and to provide a consistent experience between the two pages. The dataset and file pages have also been redesigned to be more responsive and function better across multiple devices.

This is an important step in the incremental process of the Dataset and File Redesign project, following the release of on-page previews, filtering and sorting options, tree view, and other enhancements. Additional features in support of these redesign efforts will follow in later 5.x releases.

Payara 5

A major upgrade of the application server provides security updates, access to new features like MicroProfile Config API, and will enable upgrades to other core technologies.

Note that moving from Glassfish to Payara will be required as part of the move to Dataverse 5.

Download Dataset

Users can now more easily download all files in Dataset through both the UI and API. If this causes server instability, it's suggested that Dataverse Installation Administrators take advantage of the new Standalone Zipper Service described below.

Download All Option on the Dataset Page

In previous versions of Dataverse, downloading all files from a dataset meant several clicks to select files and initiate the download. The Dataset Page now includes a Download All option for both the original and archival formats of the files in a dataset under the "Access Dataset" button.

Download All Files in a Dataset by API

In previous versions of Dataverse, downloading all files from a dataset via API was a two step process:

  • Find all the database ids of the files.
  • Download all the files, using those ids (comma-separated).

Now you can download all files from a dataset (assuming you have access to them) via API by passing the dataset persistent ID (PID such as DOI or Handle) or the dataset's database id. Versions are also supported, and you can pass :draft, :latest, :latest-published, or numbers (1.1, 2.0) similar to the "download metadata" API.

A Multi-File, Zipped Download Optimization

In this release we are offering an experimental optimization for the multi-file, download-as-zip functionality. If this option is enabled, instead of enforcing size limits, we attempt to serve all the files that the user requested (that they are authorized to download), but the request is redirected to a standalone zipper service running as a cgi executable. Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing service threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have this service running on a different host system, thus freeing the cycles on the main Application Server. The system running the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket.

Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree.

The components of the standalone "zipper tool" can also be downloaded
here:

https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip

Updated File Handling

Files without extensions can now be uploaded through the UI. This release also changes the way Dataverse handles duplicate (filename or checksum) files in a dataset. Specifically:

  • Files with the same checksum can be included in a dataset, even if the files are in the same directory.
  • Files with the same filename can be included in a dataset as long as the files are in different directories.
  • If a user uploads a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded.
  • If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, Dataverse will display an error.
  • If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
  • If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
  • Files without extensions can now be uploaded through the UI.

Pre-Publish DOI Reservation with DataCite

Dataverse installations using DataCite will be able to reserve the persistent identifiers for datasets with DataCite ahead of publishing time. This allows the DOI to be reserved earlier in the data sharing process and makes the step of publishing datasets simpler and less error-prone.

Primefaces 8

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Major Use Cases

Newly-supported use cases in this release include:

  • Users will be presented with a new workflow around dataset and file access and exploration. (Issue #6684, PR #6909)
  • Users will experience a UI appropriate across a variety of device sizes. (Issue #6684, PR #6909)
  • Users will be able to download an entire dataset without needing to select all the files in that dataset. (Issue #6564, PR #6262)
  • Users will be able to download all files in a dataset with a single API call. (Issue #4529, PR #7086)
  • Users will have DOIs reserved for their datasets upon dataset create instead of at publish time. (Issue #5093, PR #6901)
  • Users will be able to upload files without extensions. (Issue #6634, PR #6804)
  • Users will be able to upload files with the same name in a dataset, as long as a those files are in different file paths. (Issue #4813, PR #6924)
  • Users will be able to upload files with the same checksum in a dataset. (Issue #4813, PR #6924)
  • Users will be less likely to encounter locks during the publishing process due to PID providers being unavailable. (Issue #6918, PR #7118)
  • Users will now have their files validated during publish, and in the unlikely event that anything has happened to the files between deposit and publish, they will be able to take corrective action. (Issue #6558, PR #6790)
  • Administrators will likely see more success with Harvesting, as many minor harvesting issues have been resolved. (Issues #7127, #7128, #4597, #7056, #7052, #7023, #7009, and #7003)
  • Administrators can now enable an external zip service that frees up application server resources and allows the zip download limit to be increased. (Issue #6505, PR #6986)
  • Administrators can now create groups based on users' email domains. (Issue #6936, PR #6974)
  • Administrators can now set date facets to be organized chronologically. (Issue #4977, PR #6958)
  • Administrators can now link harvested datasets using an API. (Issue #5886, PR #6935)
  • Administrators can now destroy datasets with mapped shapefiles. (Issue #4093, PR #6860)

Notes for Dataverse Installation Administrators

Glassfish to Payara

This upgrade requires a few extra steps. See the detailed upgrade instructions below.

Dataverse Installations Using DataCite: Upgrade Action Required

If you are using DataCite as your DOI provider you must add a new JVM option called "doi.dataciterestapiurlstring" with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. More information about this JVM option can be found in the Installation Guide.

"doi.mdcbaseurlstring" should be deleted if it was previously set.

Dataverse Installations Using DataCite: Upgrade Action Recommended

For installations that are using DataCite, Dataverse v5.0 introduces a change in the process of registering the Persistent Identifier (DOI) for a dataset. Instead of registering it when the dataset is published for the first time, Dataverse will try to "reserve" the DOI when it's created (by registering it as a "draft", using DataCite terminology). When the user publishes the dataset, the DOI will be publicized as well (by switching the registration status to "findable"). This approach makes the process of publishing datasets simpler and less error-prone.

New APIs have been provided for finding any unreserved DataCite-issued DOIs in your Dataverse, and for reserving them (see below). While not required - the user can still attempt to publish a dataset with an unreserved DOI - having all the identifiers reserved ahead of time is recommended. If you are upgrading an installation that uses DataCite, we specifically recommend that you reserve the DOIs for all your pre-existing unpublished drafts as soon as Dataverse v5.0 is deployed, since none of them were registered at create time. This can be done using the following API calls:

  • /api/pids/unreserved will report the ids of the datasets
  • /api/pids/:persistentId/reserve reserves the assigned DOI with DataCite (will need to be run on every id reported by the the first API).

See the Native API Guide for more information.

Scripted, the whole process would look as follows (adj...

Read more

4.20

01 Apr 20:00
4e07b62
Compare
Choose a tag to compare

Dataverse 4.20

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Multiple Store Support

Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores).

General information about this capability can be found below and in the Configuration Guide - File Storage section.

S3 Direct Upload support

S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit.

General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section.

Integration Test Coverage Reporting

The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

New APIs

New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed.

More information can be found in the API Guide.

Major Use Cases

Newly-supported use cases in this release include:

  • Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262)
  • Users will be able to upload large files directly to S3. (Issue #6489, PR #6490)
  • Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628)
  • Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488)
  • Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622)
  • Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609)
  • Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623)

Notes for Dataverse Installation Administrators

Potential Data Integrity Issue

We recently discovered a potential data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (#6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (#6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse.

To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/check_datafiles_6522_6510.sh

The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration.

If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database".

If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database.

Multiple Store Support Changes

Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.

Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'.

With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files.

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade:
For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"

Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.

Once these options are set, restarting the Glassfish service is all that is needed to complete the change.

Note that the "-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above.

Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores.

Direct S3 Upload Changes

Direct upload to S3 is enabled per store by one new jvm option:

./asadmin create-jvm-options "\-Ddataverse.files.<id>.upload-redirect=true"

The existing :MaxFileUploadSizeInBytes property and dataverse.files.<id>.url-expiration-minutes jvm option for the same store also apply to direct upload.

Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files.<id>.ingestsizelimit jvm option.

API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use.

Solr Update

With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty)
followed by an "index all".

Before you start the "index all", Dataverse will appear to be empty because
the search results come from Solr. As indexing progresses, results will appear
until indexing is complete.

Dataverse Linking Fix

The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked.

Google Analytics Download Tracking Bug

The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120.

Run ReExportall

We made changes to the JSON Export in this release (Issue #6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

New JVM Options and Database Settings

New JVM Options for file storage drivers

  • The JVM option dataverse.files.file.directo...
Read more

4.19

22 Jan 16:34
affbf4f
Compare
Choose a tag to compare

Dataverse 4.19

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Open ID Connect Support

Dataverse now provides basic support for any OpenID Connect (OIDC) compliant authentication provider.

Prior to supporting this standard, new authentication methods needed to be added by pull request. OIDC support provides a standardized way for authentication, sharing user information, and more. You are able to use any compliant provider just by loading a configuration file, without touching the codebase. While the usual prominent providers like Google and others feature OIDC support there are plenty of other options to easily attach your installation to a custom authentication provider, using enterprise grade software.

See the OpenID Connect Login Options documentation in the Installation Guide for more details.

This is to be extended with support for attribute mapping, group syncing and more in future versions of the code.

Python Installer

We are introducing a new installer script, written in Python. It is intended to eventually replace the old installer (written in Perl). For now it is being offered as an (experimental) alternative.

See README_python.txt in scripts/installer and/or in the installer bundle for more information.

Major Use Cases

Newly-supported use cases in this release include:

  • Dataverse installation administrators will be able to experiment with a Python Installer (Issue #3937, PR #6484)
  • Dataverse installation administrators will be able to set up an OIDC-compliant login options by editing a configuration file and with no need for a code change (Issue #6432, PR #6433)
  • Following setup by a Dataverse administration, users will be able to log in using OIDC-compliant methods (Issue #6432, PR #6433)
  • Users of the Search API will see additional fields in the JSON output (Issues #6300, #6396, PR #6441)
  • Users loading the support form will now be presented with the math challenge as expected and will be able to successfully send an email to support (Issue #6307, PR #6462)
  • Users of https://mybinder.org can now spin up Jupyter Notebooks and other computational environments from Dataverse DOIs (Issue #4714, PR #6453)

Notes for Dataverse Installation Administrators

Security vulnerability in Solr

A serious security issue has recently been identified in multiple versions of Solr search engine, including v.7.3 that Dataverse is currently using. Follow the instructions below to verify that your installation is safe from a potential attack. You can also consult the following link for a detailed description of the issue:

RCE in Solr via Velocity Template.

The vulnerability allows an intruder to execute arbitrary code on the system running Solr. Fortunately, it can only be exploited if Solr API access point is open to direct access from public networks (aka, "the outside world"), which is NOT needed in a Dataverse installation.

We have always recommended having Solr (port 8983) firewalled off from public access in our installation guides. But we recommend that you double-check your firewall settings and verify that the port is not accessible from outside networks. The simplest quick test is to try the following URL in your browser:

  `http://<your Solr server address>:8983`

and confirm that you get "access denied" or that it times out, etc.

In most cases, when Solr runs on the same server as the Dataverse web application, you will only want the port accessible from localhost. We also recommend that you add the following arguments to the Solr startup command: -j jetty.host=127.0.0.1. This will make Solr accept connections from localhost only; adding redundancy, in case of the firewall failure.

In a case where Solr needs to run on a different host, make sure that the firewall limits access to the port only to the Dataverse web host(s), by specific ip address(es).

We would also like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.

Citation and Geospatial Metadata Block Updates

We updated two metadata blocks in this release. Updating these metadata blocks is mentioned in the step-by-step upgrade instructions below.

Run ReExportall

We made changes to the JSON Export in this release (#6426). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

BinderHub

https://mybinder.org now supports spinning up Jupyter Notebooks and other computational environments from Dataverse DOIs.

Widgets update for OpenScholar

We updated the code for widgets so that they will keep working in OpenScholar sites after the upcoming upgrade OpenScholar upgrade to Drupal 8. If users of your dataverse have embedded widgets on an Openscholar site that upgrades to Drupal 8, you will need to run this Dataverse version (or later) for the widgets to keep working.

Payara tech preview

Dataverse 4 has always run on Glassfish 4.1 but changes in this release (PR #6523) should open the door to upgrading to Payara 5 eventually. Production installations of Dataverse should remain on Glassfish 4.1 but feedback from any experiments running Dataverse on Payara 5 is welcome via the usual channels.

Notes for Tool Developers and Integrators

Search API

The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.

Additional fields are now available via the Search API, mostly related to information about specific dataset versions.

Complete List of Changes

For the complete list of code changes in this release, see the 4.19 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.19.war
  1. Restart glassfish.

  2. Update Geospatial Metadata Block

  • wget https://github.com/IQSS/dataverse/releases/download/v4.19/geospatial.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"
  1. (Optional) Run ReExportall to update JSON Exports

    http://guides.dataverse.org/en/4.19/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

4.18.1

20 Nov 23:24
a91d370
Compare
Choose a tag to compare

Dataverse 4.18.1

This release provides a fix for a regression introduced in 4.18 and implements a few other small changes.

Release Highlights

Proper Validation Messages

When creating or editing dataset metadata, users were not receiving field-level indications about what entries failed validation and were only receiving a message at the top of the page. This fix restores field-level indications.

Major Use Cases

Use cases in this release include:

  • Users will receive the proper messaging when dataset metadata entries are not valid.
  • Users can now view the expiration date of an API token and revoke a token on the API Token tab of the account page.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18.1 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.1.war
  1. Restart glassfish.

4.18

14 Nov 18:13
118aa71
Compare
Choose a tag to compare

Dataverse 4.18

Note: There is an issue in 4.18 with the display of validation messages on the dataset page (#6380) and we recommend using 4.18.1 for any production environments.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Page Previews and Previewers

File-level External Tools can now be configured to display in a "Preview Mode" designed for embedding within the file landing page.

While not technically part of this release, previewers have been made available for several common file types. The previewers support for spreadsheet, image, text, document, audio, video, html files and more. These previewers can be found in the Qualitative Data Repository Github Repository. The spreadsheet viewer was contributed by the Dataverse SSHOC project.

Microsoft Login

Users can now create Dataverse accounts and login using self-provisioned Microsoft accounts such as live.com and outlook.com. Users can also use Microsoft accounts managed by their institutions. This new feature not only makes it easier to log in to Dataverse but will also streamline the interaction between any external tools that utilize Azure services that require login.

Add Data and Host Dataverse

More workflows to add data have been added across the UI, including a new button on the My Data tab of the Account page, as well as a link in the Dataverse navbar, which will display on every page. This will provider users much easier access to start depositing data. By default, the Host Dataverse will be the installation root dataverse for these new Add Data workflows, but there is now a dropdown component allowing creators to select a dataverse you have proper permissions to create a new dataverse or dataset in.

Primefaces 7

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Integration Test Pipeline and Test Health Reporting

As part of the Dataverse Community's ongoing efforts to provide more robust automated testing infrastructure, and in support of the project's desire to have the develop branch constantly in a "release ready" state, API-based integration tests are now run every time a branch is merged to develop. The status of the last test run is available as a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

Make Data Count Metrics Updates

A new configuration option has been added that allows Make Data Count metrics to be collected, but not reflected in the front end. This option was designed to allow installations to collect and verify metrics for a period before turning on the display to users. It is suggested that installations turn on Make Data Count as part of the upgrade.

Search API Enhancements

The Dataverse Search API will now display unpublished content when an API token is passed (and appropriate permissions exist).

Additional Dataset Author Identifiers

The following dataset author identifiers are now supported:

Major Use Cases

Newly-supported use cases in this release include:

  • Users can view previews of several common file types, eliminating the need to download or explore a file just to get a quick look.
  • Users can log in using self-provisioned Microsoft accounts and also can log in using Microsoft accounts managed by an organization.
  • Dataverse administrators can now revoke and regenerate API tokens with an API call.
  • Users will receive notifications when their ingests complete, and will be informed if the ingest was a success or failure.
  • Dataverse developers will receive feedback about the health of the develop branch after their pull request was merged.
  • Dataverse tool developers will be able to query the Dataverse API for unpublished data as well as published data.
  • Dataverse administrators will be able to collect Make Data Count metrics without turning on the display for users.
  • Users with a DAI, ResearcherID, or ScopusID and use these author identifiers in their datasets.

Notes for Dataverse Installation Administrators

API Token Management

  • You can now delete a user's API token, recreate a user's API token, and find a token's expiration date. See the Native API guide for more information.

New JVM Options

:mdcbaseurlstring allows dataverse administrators to use a test base URL for Make Data Count.

New Database Settings

:DisplayMDCMetrics can be set to false to disable display of MDC metrics.

Notes for Tool Developers and Integrators

Preview Mode

Tool Developers can now add the hasPreviewMode parameter to their file level external tools. This setting provides an embedded, simplified view of the tool on the file pages for any installation that installs the tool. See Building External Tools for more information.

API Token Management

If your tool writes content back to Dataverse, you can now take advantage of administrative endpoints that delete and re-create API tokens. You can also use an endpoint that provides the expiration date of a specific API token. See the Native API guide for more information.

View Unpublished Data Using Search API

If you pass a token, the search API output will include unpublished content.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

  1. Undeploy the previous version.
  • <glassfish install path>/glassfish4/bin/asadmin list-applications
  • <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
  1. Stop glassfish and remove the generated directory, start.
  • service glassfish stop
  • remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
  • service glassfish start
  1. Deploy this version.
  • <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.war
  1. Restart glassfish.

  2. Update Citation Metadata Block

  • wget https://github.com/IQSS/dataverse/releases/download/v4.18/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
  1. (Recommended) Enable Make Data Count if your installation plans to make use of it at some point in the future.