Generalizability

Purpose

OHDSI Study Protocol: OHDSI_Study_Protocol_v1.0

This study aims to evaluate and characterize the generalizability or coverage of the OMOP vocabulary terms included in the OMOP2OBO mapping set to OMOP vocabulary terms utilized in the Observational Health Data Sciences and Informatics (OHDSI) Concept Prevalence study sites.

As described here, the Concept Prevalence study was designed to provide researchers with additional context regarding the frequency at which different clinical codes occur across the OHDSI research network:

We want to study the usage patterns of Concepts across different OMOP CDM instances. This in itself could be useful information to answer many questions, but we have a concrete reason: For any one medical entity, the granularity of codes captured in a data source can vary greatly. For example, Chronic Kidney Disorder stage II can be coded as ICD9 code 585.2 Chronic kidney disease, Stage II (mild); 585.9 Chronic kidney disease, unspecified or even as 586 Renal failure, unspecified. However, this information is key for any cohort definition. Currently, researchers have no way of knowing whether a certain concept with high granularity is even available for selection, or whether they have to use a generic concept in combination with some auxiliary information to define the cohort correctly. Each data source instance is a black box and knowledge about the distribution of the concepts is limited to the very instance researchers have access to. But OHDSI Network Studies are dependent on cohort definitions that work across the network.

Analysis

The main research question is how does the coverage of the OMOP vocabulary terms present in the OMOP2OBO mappings differ across the OHDSI Concept Prevalence study sites?

The specific aims of this study are as follows:

Examine OMOP2OBO coverage across the Concept Prevalence sites by identifying:
- OMOP vocabulary terms that exist in OMOP2OBO and one or more site.
- OMOP vocabulary terms only present in OMOP2OBO and none of the Concept Prevalence sites
- OMOP vocabulary terms only present in one or more the site.
Demonstrate the potential for [molecular] biological inference of OMOP2OBO by characterizing differences in OBO ontology term enrichment across the Concept Prevalence sites when varying different aspects of data provenance (e.g. site type, clinical specialty, and site location).

Study Sites

In addition to the Concept Prevalence study sites (n=22), data was obtained from two independent academic medical centers. High-level descriptions of each site, including the total number of records and concepts are provided below.

Database	Type	Location	Record Count	Concept Count
Ajou University Database (Ajou)	EHR	Non-US	30,238,709	6,055
Australian Electronic practice based research network (AU-ePBRN)	EHR	Non-US	11,658,378	5,027
Columbia University Medical Center Database (CUMC)	EHR	US	938,078,465	21,502
IBM MarketScan Commercial Database (CCAE)	CLAIMS	US	12,649,562,658	31,570
IBM MarketScan Medicare Supplemental Database (MDCR)	CLAIMS	US	2,770,787,154	25,121
IBM MarketScan Multi-State Medicaid Database (MDCD)	CLAIMS	US	4,283,172,117	19,133
IQVIA Disease Analyzer (DA) France	EHR	Non-US	39,632,134	3,423
IQVIA Disease Analyzer (DA) Germany	EHR	Non-US	851,853,377	9,276
IQVIA Longitudinal Patient Data (LPD) Australia	EHR	Non-US	56,940,803	5,833
IQVIA US Ambulatory EMR (AmbEMR)	EHR	US	10,634,058,375	62,161
IQVIA US Hospital Charge Data Master (CDM)	EHR	US	4,857,228,360	19,352
IQVIA US LRxDx Open Claims (Open Claims)	CLAIMS	US	71,678,847,042	20,083
Japan Medical Data Center database (JMDC)	EHR	Non-US	1,184,325,523	6,833
Korea National Health Insurance Service / National Sample Cohort (NHIS/NSC Korea)	CLAIMS	Non-US	323,096,899	6,667
Medical Information Mart for Intensive Care III (MIMIC3)	EHR	US	124,127,038	3,781
Optum De-Identified Clinformatics Data-Mart-Database— Socio-Economic Status (SES)	CLAIMS	US	13,369,194,028	36,943
Optum De-Identified Clinformatics Data-Mart-Database—Date of Death (DOD)	CLAIMS	US	9,716,879,363	34,853
Optum De-identified Electronic Health Record Dataset (PANTHER)	EHR	US	27,894,204,112	59,777
Premier Healthcare Database (PREMIER)	CLAIMS	US	16,794,698,039	18,903
Stanford Medicine Research Data Repository (STaRR)	EHR	US	416,175,821	11,161
The Healthcare Cost and Utilization ProjectNationwide Inpatient Sample (HCUP)	EHR	US	744,807,853	9,391
Tufts Medical Center Database (Tufts)	EHR	US	66,863,985	21,118
UCHealth	EHR	US	1,215,613,326	19,073
USC PScanner	EHR	US	29,703,213	11,476

Data

For each data site, standard concepts used at least once in practice were obtained from the Condition Occurrence (i.e. SNOMED-CT), Drug Exposure (i.e. ingredient-level; RxNorm), and Measurement (i.e. LOINC) tables. For all concepts, the total frequency was obtained and consistent with the Concept Prevalence study, all concepts occurring fewer than 10 times were ignored and all remaining concepts occurring fewer than 100 times were assigned a count of 100.

Results

Results are presented below by clinical domain. Overall, the OMOP vocabulary terms included in the OMOP2OBO mapping set provided exceptional coverage, which differed both by Concept Prevalence study site and clinical domain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalizability

Purpose

Analysis

Study Sites

Data

Results

Condition Occurrence

Drug Exposure Ingredients

Measurements

Project Information

Releases

Current Release

Mapping Information

Clinical Data

Knowledge Representation

Validation

Enabling Reproducible Research

Clone this wiki locally