Skip to content

Commit

Permalink
some data types and uses
Browse files Browse the repository at this point in the history
  • Loading branch information
kweav committed Dec 12, 2024
1 parent 2951b9a commit 673c69b
Show file tree
Hide file tree
Showing 3 changed files with 83 additions and 1 deletion.
2 changes: 1 addition & 1 deletion 02-data_types.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Structured data tables often describe entries in terms of codes from standardize

Clinical notes are, perhaps unsurprisingly, generally shared as seemingly straightforward text files. However, the simple format should not be taken as a suggestion that the data are easy to interpret. Some EHR systems contain literally dozens of types of notes, covering specialties such as pathology or surgery; specific moments in care such as admission or discharge; particular procedures such as colonoscopies; patient-provider interactions such as telehealth or phone encounters, and many others. In addition to differing in content, these sources may have different layouts and formats, ranging from free-form reports to structured SOAP (subjective, objective, assessment, and plan) formats or even templated procedure reports. Understanding the types of notes available in a given context and where relevant data might be found is a key step in effectively using clinical notes.

When used in EHR research, both structured data and clinical notes are generally de-identified to protect patient privacy. Patient ID numbers might be replaced with new identifiers, with linkages maintained by institutional “honest brokers” [INSERT REF HERE] charged with providing clinical data for research purposes. In some cases, dates may be changed as well. Clinical notes are generally “de-identified” through specialized software designed to remove names, dates, locations, and other sensitive details. Researchers working with institutions to access clinical data should be sure to understand local data de-identification practices.
When used in EHR research, both structured data and clinical notes are generally de-identified to protect patient privacy. Patient ID numbers might be replaced with new identifiers, with linkages maintained by institutional “honest brokers” [@Dhir2008] charged with providing clinical data for research purposes. In some cases, dates may be changed as well. Clinical notes are generally “de-identified” through specialized software designed to remove names, dates, locations, and other sensitive details. Researchers working with institutions to access clinical data should be sure to understand local data de-identification practices.

## Physiological

Expand Down
53 changes: 53 additions & 0 deletions 03-data_uses.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,59 @@ ottrpal::set_knitr_image_path()

## Types of questions that can be asked with clinical data

### Risk Prediction

Risk prediction in clinical research involves using data to assess the likelihood of certain outcomes or events occurring in patients. This could include predicting the risk of developing a particular disease, experiencing a specific complication, or responding to a treatment.

Data used for risk prediction can come from various sources, including:

* Clinical Data: This includes patient demographics, medical history, laboratory results, and imaging studies.
* Genetic Data: Genetic information, such as DNA sequencing results, can provide valuable insights into an individual's susceptibility to certain diseases.
* Environmental and Lifestyle Data: Factors such as diet, exercise habits, smoking status, and environmental exposures can influence disease risk and may be included in risk prediction models.
* Biomarkers: Biological markers indicative of disease or physiological processes can be used as predictors in risk models [@Bodaghi_Fattahi_Ramazani_2023].

Once relevant data is collected, statistical and machine learning techniques can be applied to develop predictive models. These models aim to identify patterns and relationships within the data that are associated with the outcome of interest. Common techniques include logistic regression, decision trees, random forests, support vector machines, and neural networks.

After the model is trained on a dataset, it can be validated using independent datasets to assess its performance and generalizability. Once validated, the model can be used to predict risk in new patients based on their individual characteristics and data.

Clinical prediction rules are a subset of risk prediction models, specific to clinical research.

Examples of risk prediction models ...

Risk prediction models are important because ...

Overall, risk prediction in clinical research allows healthcare professionals to identify individuals at higher risk of certain outcomes, enabling targeted interventions, personalized treatments, and more efficient resource allocation.

### Cohort identification for research

Clinical data plays a crucial role in cohort identification for research purposes. Researchers typically use electronic health records (EHRs), medical databases, or registries to identify cohorts based on specific criteria such as age, gender, medical conditions, treatments, medications, and outcomes. Advanced data mining and natural language processing techniques can also be employed to extract relevant information from unstructured data sources like clinical notes. Once cohorts are identified, researchers can analyze the data to study disease progression, treatment effectiveness, and outcomes.

Cohort identification is important regardless of research study type, but to provide specific examples:

* Research Design: Identifying cohorts allows researchers to design studies with appropriate inclusion or exclusion criteria. By selecting specific groups of individuals with similar characteristics or exposures, researchers can investigate hypotheses effectively.
* Clinical Insights: Cohort studies enable researchers to observe the natural history of diseases, track outcomes over time, and assess the effectiveness of interventions or treatments. Understanding how different factors influence disease progression or treatment response can inform clinical decision-making and improve patient care.
* Epidemiological Studies: Cohort identification is crucial for epidemiological research to understand the incidence, prevalence, and risk factors associated with diseases. By following cohorts over time, researchers can identify trends, patterns, and associations that contribute to our understanding of disease causation and prevention.
* Precision Medicine: Identifying cohorts based on genetic profiles, biomarkers, or other specific characteristics allows researchers to tailor treatments and interventions to individual patients. This approach, known as precision medicine, aims to optimize therapeutic outcomes while minimizing adverse effects.
* Healthcare Policy and Planning: Cohort studies provide valuable data for informing healthcare policies, resource allocation, and public health strategies. By identifying high-risk populations or groups with specific healthcare needs, policymakers can develop targeted interventions to improve health outcomes and reduce disparities.

### Case report forms

### Clinical trials/studies

### Retrospective analysis

## Research Design Considerations

Researchers need to intentionally use methods earlier in the research process than data analysis to manage data biases associated with clinical data, especially EHR.

One of the most important challenges in using EHR data in cancer research is that, as in many other fields, healthcare data are plagued with several types of biases that result from disparities in the delivery of healthcare. Overall, cancer research has historically relied on data from high resource academic medical centers, which disproportionately provide care to patients who are White, have high socioeconomic status, and live in urban areas. As a result, medical knowledge produced from these data have disproportionately benefited those patients. Different sources of bias are prevalent in EHR data. For example,

* *information representativeness bias* occurs when certain groups are disproportionately less present in the EHR because they have no contact with the healthcare system.
* On the other hand, *information presence bias* occurs when certain groups may be represented in the EHR, but have disproportionately less comprehensive healthcare data due to issues such as overall lower access to and use of healthcare service, lack of a primary care provider, lack of access to specialty care, and lack of access to digital resources (e.g., patient portals, home sensors, telehealth) that can be used to provide healthcare data.
* *Treatment biases* happen when certain groups receive disproportionate access to more advanced treatments, which is often determined by social drivers such as insurance, distance, health literacy, and socioeconomic status.
* Lastly, *algorithm bias* further amplifies these previous sources of bias by leveraging biased EHR data to make predictions about diagnosis, treatment and prognosis that are used by clinicians to make potentially biased healthcare decisions, which are then documented in the EHR.

Recent advances in sophisticated and costly technology such as genetic testing, artificial intelligence, and digital health, which are disproportionately available in high resource healthcare systems further compound the problem.


## Conclusion
29 changes: 29 additions & 0 deletions book.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,32 @@
@article{Bodaghi_Fattahi_Ramazani_2023,
title={Biomarkers: Promising and valuable tools towards diagnosis, prognosis and treatment of Covid-19 and other diseases},
volume={9},
ISSN={2405-8440},
url={https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9884646/},
DOI={10.1016/j.heliyon.2023.e13323},
abstractNote={The use of biomarkers as early warning systems in the evaluation of disease risk has increased markedly in the last decade. Biomarkers are indicators of typical biological processes, pathogenic processes, or pharmacological reactions to therapy. The application and identification of biomarkers in the medical and clinical fields have an enormous impact on society. In this review, we discuss the history, various definitions, classifications, characteristics, and discovery of biomarkers. Furthermore, the potential application of biomarkers in the diagnosis, prognosis, and treatment of various diseases over the last decade are reviewed. The present review aims to inspire readers to explore new avenues in biomarker research and development.},
number={2},
journal={Heliyon},
author={Bodaghi, Ali and Fattahi, Nadia and Ramazani, Ali},
year={2023},
month=jan,
pages={e13323}
}

@article{Dhir2008,
title={A multidisciplinary approach to honest broker services for tissue banks and clinical data},
volume={113},
url={https://acsjournals-onlinelibrary-wiley-com.fhcrc.idm.oclc.org/doi/10.1002/cncr.23768},
DOI={10.1002/cncr.23768},
issue={7},
journal={Cancer},
author={Rajiv Dhir and Ashok A Patel and Sharon Winters and Michelle Bisceglia and Dennis Swanson and Roger Aamodt and Michael J Becich},
year={2008},
month=aug,
pages={1705-1715}
}


@Manual{rmarkdown2021,
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone},
Expand Down

0 comments on commit 673c69b

Please sign in to comment.