Skip to content

Commit

Permalink
Merge pull request #95 from JVickery-TBS/feature/remove-unsupported-v…
Browse files Browse the repository at this point in the history
…alidation-reports

Clean Validation Reports Job
  • Loading branch information
duttonw authored Dec 9, 2024
2 parents 1073c80 + 9651214 commit ecf8bdd
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 3 deletions.
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,13 @@ resources use the following option:

ckanext.validation.show_badges_in_listings = False

### Clean validation reports

To prevent the extension from keeping validation reports for unsupported Resource formats. Defaults to False:

ckanext.validation.clean_validation_reports = True

Once a Resource is updated and its format is not supported in ckanext.validation.formats, a job will be enqueued to remove the validation reports from the Resource.

## How it works

Expand All @@ -154,13 +161,13 @@ hosted in CKAN itself or elsewhere. Whenever a resource of the appropriate
format is created or updated, the extension will validate the data against a
collection of checks. This validation is powered by
[Frictionless Framework](https://github.com/frictionlessdata/framework), a very
powerful data validation library developed by the [Open Knowledge Foundation](https://okfn.org)
powerful data validation library developed by the [Open Knowledge Foundation](https://okfn.org)
as part of the [Frictionless Data](https://frictionlessdata.io) project.
Frictionless Framework provides an extensive suite of [checks](https://framework.frictionlessdata.io/docs/checks/baseline.html)
that cover common issues with tabular data files.

These checks include structural problems like missing headers or values, blank
rows, etc., but also can validate the data contents themselves (see
rows, etc., but also can validate the data contents themselves (see
[Data Schemas](#data-schemas)) or even run [custom checks](https://framework.frictionlessdata.io/docs/guides/validating-data.html#custom-checks).

The result of this validation is a JSON report. This report contains all the
Expand Down Expand Up @@ -427,7 +434,7 @@ to get up and running just by adding the following fields to the

Here's more detail on the fields added:

* `schema`: This can be a [Table Schema](http://frictionlessdata.io/specs/table-schema/)
* `schema`: This can be a [Table Schema](http://frictionlessdata.io/specs/table-schema/)
JSON object or an URL pointing to one. In the UI form you can upload a JSON file, link to one
providing a URL or enter it directly. If uploaded, the file contents will be
read and stored in the `schema` field. In all three cases the contents will be
Expand Down
37 changes: 37 additions & 0 deletions ckanext/validation/plugin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,9 @@ def after_update(self, context, data_dict):

_run_async_validation(resource_id)

if _should_remove_unsupported_resource_validation_reports(data_dict):
p.toolkit.enqueue_job(fn=_remove_unsupported_resource_validation_reports, args=[resource_id])

# IPackageController

def before_index(self, index_dict):
Expand Down Expand Up @@ -344,3 +347,37 @@ def _get_underlying_file(wrapper):
return wrapper.stream
return wrapper.file


def _should_remove_unsupported_resource_validation_reports(res_dict):
if not t.h.asbool(t.config.get('ckanext.validation.clean_validation_reports', False)):
return False
return (not res_dict.get('format', u'').lower() in settings.SUPPORTED_FORMATS
and (res_dict.get('url_type') == 'upload'
or not res_dict.get('url_type'))
and (t.h.asbool(res_dict.get('validation_status', False))
or t.h.asbool(res_dict.get('extras', {}).get('validation_status', False))))


def _remove_unsupported_resource_validation_reports(resource_id):
"""
Callback to remove unsupported validation reports.
Controlled by config value: ckanext.validation.clean_validation_reports.
Double check the resource format. Only supported Validation formats should have validation reports.
If the resource format is not supported, we should delete the validation reports.
"""
context = {"ignore_auth": True}
try:
res = p.toolkit.get_action('resource_show')(context, {"id": resource_id})
except t.ObjectNotFound:
log.error('Resource %s does not exist.', resource_id)
return

if _should_remove_unsupported_resource_validation_reports(res):
log.info('Unsupported resource format "%s". Deleting validation reports for resource %s',
res.get(u'format', u''), res['id'])
try:
p.toolkit.get_action('resource_validation_delete')(context, {
"resource_id": res['id']})
log.info('Validation reports deleted for resource %s', res['id'])
except t.ObjectNotFound:
log.error('Validation reports for resource %s do not exist', res['id'])

0 comments on commit ecf8bdd

Please sign in to comment.