Skip to content

Commit

Permalink
documentation for validate-files-on-publish. (#6558)
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Apr 1, 2020
1 parent 9280665 commit 58ac83b
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
6 changes: 5 additions & 1 deletion doc/sphinx-guides/source/admin/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ See the :doc:`/api/intro` section of the API Guide for a high level overview of
A Dataset Is Locked And Cannot Be Edited or Published
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's normal for the ingest process described in the :doc:`/user/tabulardataingest/ingestprocess` section of the User Guide to take some time but if hours or days have passed and the dataset is still locked, you might want to inspect the locks and consider deleting some or all of them.
It's normal for the ingest process described in the :doc:`/user/tabulardataingest/ingestprocess` section of the User Guide to take some time but if hours or days have passed and the dataset is still locked, you might want to inspect the locks and consider deleting some or all of them. It is recommended to restart the application server if you are deleting an ingest lock, to make sure the ingest job is no longer running in the background.

A dataset is locked with a lock of type "finalizePublication" (the lock type appears in the "reason" column of the DatasetLock table) while the persistent identifiers of the datafiles in the dataset are registered or updated, and/or while the physical files are being validated by recalculating the checksums and verifying them against the values stored in the database, before the publication process can be complete (Note that either of the two tasks can be disabled with database options - see :doc:`config`). If a dataset has been stuck in this state for a long period of time, check the "Info" value of the entry in the corresponding DatasetLock table. If it says "FILE VALIDATION ERROR" - it really means that one or more of the files have failed the validation and the problem must be resolved (or the datafile purged from the dataset) before you delete the lock and advice the user to try publishing again. Real issues that have resulted in corrupted datafiles during normal operation of Dataverse in the past: Botched file deletes - while a datafile is in DRAFT, attempting to delete it from the dataset also involved deleting the physical files. In the past we've observed partially successful deletes, that would fail to delete the entry from the database, after having successfully removed the physical files - resulting in a datafile linked to a missing file. We believe we have addressed what was causing this condition so it shouldn't happen again - there may be a datafile in this state in your database. Solving the issue would involve either restoring the file from backups, or, if that is not an option, purging the datafile from the databaes and asking the user to upload the file again. Another real life condition we've seen: a failed tabular data ingest that leaves the datafile un-ingested, BUT with the physical file already replaced by the generated tab-delimited version of the data. This datafile will fail to validate because the checksum in the database is of the original file and will not match that of the tab-delimited version. To fix: luckily, this is easily reversable, since the uploaded original should be saved in your storage, with the .orig extension. Simply swapping the .orig copy with the main file associated with the datafile will fix it. Similarly, we believe this condition should not happen again in Dataverse versions 4.20+, but you may have some legacy cases on your server. The goal of the validation framework is to catch these types of conditions while the dataset is still in DRAFT.

If the finalizePublication lock has the info label "Registering PIDs for Datafiles" or "Validating Datafiles Asynchronously", and the dataset has been in this state for hours or longer, it is somewhat safe to assume that it is stuck (for example, the process may have been interrupted by an application server restart, or a system crash), so you may want to remove the lock (make sure the application server is restarted) and advice the user to try publishing again.

See :doc:`dataverses-datasets`.

Expand Down
18 changes: 18 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1362,6 +1362,10 @@ If you don't want to register file-based PIDs for your installation, set:

Note: File-level PID registration was added in 4.9 and is required until version 4.9.3.

Note: The dataset will be locked, and the validation will be performed asynchronously, when there are more than N files in the dataset, where N is configured by the database setting ``:PIDAsynchRegFileCount`` (default: 10).



.. _:IndependentHandleService:

:IndependentHandleService
Expand All @@ -1372,6 +1376,20 @@ By default this setting is absent and Dataverse assumes it to be false.

``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:IndependentHandleService``

.. _:FileValidationOnPublishEnabled:

:FileValidationOnPublishEnabled
+++++++++++++++++++++++++++++++

Toggles validation of the physical files in the dataset when it's published, by recalculating the checksums and comparing against the values stored in the DataFile table. By default this setting is absent and Dataverse assumes it to be true.

If you don't want the datafiles to be validated on publish, set:

``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FileValidationOnPublishEnabled``

Note: The dataset will be locked, and the validation will be performed asynchronously, similarly to how we handle assigning persistend identifiers to datafiles, when there are more than N files in the dataset, where N is configured by the database setting ``:PIDAsynchRegFileCount`` (default: 10).


:ApplicationTermsOfUse
++++++++++++++++++++++

Expand Down

0 comments on commit 58ac83b

Please sign in to comment.