Skip to content

clinical-data-mining/NLP_Radiology_MSK_MET

 
 

Repository files navigation

Natural Language Processing Based Radiology Predictions Validate Genomic Mechanisms Driving Metastasis Identified in MSK-MET

Background Information:

The recently published MSK-MET paper serves as a resource of common metastatic sites based largely on ICD billing codes; however, ICD Billing codes are known to be incomplete. Expanding on this research, a Natural Language Processing (NLP)-based study was employed to mine Electronic Health Records (EHR) for sites of metastasis from radiology reports. It was concluded that chromosomal instability, measured through the fraction of genome altered (FGA), is positively correlated with metastatic burden from a pan-cancer standpoint.

Objectives:

Data for this internship project was mined from radiology reports, a suitable alternative to the billing codes themselves. The purpose of this project is to integrate genomic data with predictions made from the radiology reports as a way of validating the findings communicated through MSK-MET. We hypothesize that the genomic biomarkers highlighted in the paper will be the same when metastatic events are predicted using radiology reports.

Methods:

Comparisons were made between published and NLP results: genomic features associated with metastatic burden, age at first metastasis, and common sites of metastasis. Statistical analyses were also performed including a Wilcox signed rank and the computation of Spearman correlation coefficients.

Results:

Significant correlations between FGA and metastatic burden, as well as TMB and metastatic burden, from the radiology predictions align with those found through MSK-MET, with the exception of 10 cancer types where we found significance in the radiology predictions only and 2 cancer types in MSK-MET only. The mean age difference between metastases indicated in radiology predictions and billing codes was 201 days across all sites of metastasis.

Conclusions:

The radiology predictions which were used to validate MSK-MET were culminated by way of NLP. These results suggest that NLP can be an efficient way to mine clinical data and conduct future large-scale genomic analysis.

Link to figures here: https://docs.google.com/presentation/d/13dySh9uD1UiQ6DAcG4AMPk3sGijtokUV/edit?usp=sharing&ouid=103657296078016662765&rtpof=true&sd=true

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%