Configuration Driven Data Integrity Checks #319
Unanswered
jbee
asked this question in
Specs & RFCs
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a quick draft of a new feature in DHIS2 for configuration driven data integrity checks.
This means a generic check runner picks up a configuration file that contains information about the checks such as their name, description, severity or the SQL to run to perform the check.
In general this will lead to a situation where DHIS2 has 2 types of checks:
While both can be represented handled in a uniform way in-memory (see backend implementation) these two sources will remain to contribute checks.
Web API
The
DataIntegrityController
is extended with a new endpointGET /dataIntegrity
to list all the available check names (IDs). This includes both "hard coded" checks as well as the new configuration based ones.Running the checks is still done using
POST /dataIntegrity
listing the checks to perform using thechecks
parameter. This again supports both the "hard coded" as well as the configuration based ones. For backwards compatibility not providing anychecks
will run all "hard coded" checks. Since the result of this has been a web message with the job configuration report this endpoint cannot be used to return the check outcome and stay backwards compatible.To return check returns from endpoints directly two further endpoints are introduced:
POST /dataIntegrity/summary
(also haschecks
parameter to run one or more checks)POST /dataIntegrity/details
(also haschecks
parameter to run one or more checks)These are used to run either the summary SQL or the details SQL of configuration based checks.
If a check does not have one or the other this is indicated by a
409 Conflict
when an individual check was run. When multiple checks are run checks that do not support the view type are skipped. A client can distinguish a successful check with no issues found from a skipped check by the presence of a member with the check's name in the root object. If such a member is absent the check did not run.Example response format for a summary request:
percentage
is not always possible to calculate - member will not be present if that is the caseExample response format for a details request:
Each conflicting metadata object causes one entry in the
issues
list stating the ID, name and optionally acomment
.The
comment
is used to provide additional information about the context of the violation for checks where this makes sense.The
refs
can optionally be used to provide references to other elements important in the conflict.The
summary
anddetails
formats are identical except that check objects insummary
have thecount
andpercentage
members while check objects indetails
have theissues
member.UI
To run summary of one or more checks these are selected in multi-select drop-down before clicking a Run button:
The selection is populated using thet
GET /dataIntegrity
endpoint.The summary is presented as a table:
When clicking the Details link in the above table the details table is presented:
Backend Implementation
Files
Configuration based checks are bundled with the application as
.yaml
files in the folderdata-integrity-checks
.We use the same files found in project https://github.com/dhis2/metadata-assessment.
To find the files in the archive a root
data-integrity-checks.yaml
file refers to the individual checks that should be included.This is so we don't get huge hard to navigate and read file and so that the
.yaml
files used can stay the same for current "work-around" solution and the solution where these are included directly in the archive.The
summary_uid
,details_uid
andsection_order
properties will be ignored as they serve no purpose when used directly within core.In Memory Representation
Checks from files are loaded on first use and converted to an in-memory record describing the check including a function to run the check.
The same representation will/can be used to represent hard coded checks that support summary and/or details format.
This means after the initial loading from both YAML sources as well as hard coded explicit checks that might do their work in memory the checks are uniformly handled.
Beta Was this translation helpful? Give feedback.
All reactions