Use Achilles results to improve performance #455
Replies: 1 comment
-
This is an interesting idea that we can definitely consider as part of a discussion about DQD performance! It's probably worth starting a larger thread about performance which includes gathering some benchmarks from different institutions/databases - do you guys collect runtime of DQD, ACHILLES, etc in EHDEN? This also got me thinking again about the process of running DQD and how it fits in alongside all of the other OHDSI tools. Basically thinking how some ACHILLES analyses could technically just serve as DQ checks themselves (i.e. number of records with concept_id 0). But how other ACHILLES analyses might fail altogether if you have a fatal CDM conformance issue. Maybe there is a way to build a more structured flow that allows the quality and characterization checks/analyses to execute in the "right" order and leverage each others' results to improve performance and aid in users' interpretation of the results (this is maybe the same / similar to @clairblacketer's idea for an improved DQD report). |
Beta Was this translation helpful? Give feedback.
-
This thread is to explore whether we can use the Achilles results to calculate the DQD checks. From my experience, most OMOP sites will run both Achilles and DQD anyway. Some Achilles pre-computed statistics, like concept counts, can considerably speed-up DQD.
I do realise that DQD was purposefully designed as a stand-alone tool. And the data quality tool in Achilles itself was deprecated.
But we see data partners struggling with the long computation time of the data quality tools.
(admittedly, in EHDEN we added a third tool ourselves: 'CdmInspection'. Then again, this does already reuse some Achilles results.).
Some examples of DQD checks that can use Achilles results:
fkDomain
,fkClass
andisStandardValidConcept
can use Achilles analyses xx01 ("Number of xx records, by xx_concept_id").plausibleValueLow
andplausibleValueHigh
can use Achilles analyses xx20 ("Number of xx records by xx start month")plausibleTemporalAfter
can use Achilles analyses xx11 ("Number of xx records with end date < start date")plausibleDuringLife
can use Achilles analyses 511-515 ("Distribution of time from death to last condition/drug/visit/procedure/observation")Note that some are not implemented for all domains. And there are a few differences, e.g. how observation period is handled.
I have also looked at the other DQD checks, and I think 19 checks can at least be approximated by using Achilles results. A few DQD checks (I ended with four), need additional Achilles analyses.
Tagging @clairblacketer, @katy-sadowski, @fdefalco, @andrewwilliams
Beta Was this translation helpful? Give feedback.
All reactions