Skip to content

NIGMS/Analysis-of-Biomedical-Data-for-Biomarker-Discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

course-card Image from https://doi.org/10.3389/fpsyt.2020.00432


Analysis of Biomedical Data for Biomarker Discovery

Dr. Christopher L. Hemme

The University of Rhode Island College of Pharmacy

Contents

Introduction

Welcome to the cloud-based learning module Analysis of Biomedical Data for Biomarker Discovery presented by the Rhode Island INBRE Molecular Informatics Core (MIC) at the University of Rhode Island. The module was developed by Dr. Christopher L. Hemme, Director of the MIC using data from Dr. Nisanne Ghonem at the Department of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island. Our goal with this module is to bridge the gap between bioinformaticians (particularly those from a non-clinical background) and clinicians or clincal researchers who often view the same biomedical data in very different ways. For example, bioinformaticians may not be familiar with the conventions for data presentation and visualization in the clinical literature, while clinicians are often overwhelmed by the volumes of data generated by modern bioinformatics methods or may question the utility of the results of bioinformatics analyses compared to more traditional clinical methods. We present this challenge in terms of clinical biomarker discovery, that is, biological measures of health and disease. For the clinician, a biomarker must be cheap and easy to measure, accurate, and easily interpretable for both the clinician and the patient. A bioinformatician, on the other hand, is often looking at biomarkers on a global scale, trying to identify multiple correlated biomarkers that may or may not be obvious clinical targets. Understanding the basic principles behind biomarker discovery and analysis will help these two groups better communicate when it comes time for data analysis and publication.

This module offers two computing pathways: AWS (Amazon Web Services) or GCP (Google Cloud Platform). Users can choose their preferred cloud service to run the Jupyter notebooks, ensuring flexibilty and accessibilty based on their existing infrastructure or familairty. Detailed instructions for setting up and using either AWS or GCP for this module are provided within their corresponding folders within this repository. This module will cost about $1.00 to run, assuming you shut down and delete all resources when you are finished.

Watch this Introduction Video to learn more about the module.

Overview

This repository contains files comprising a learning module covering concepts in biomarker discovery. The learning module consists of 9 submodules, with each submodule consisting of a Jupyter Notebook running the R programming language. We assume the user has a basic knowledge of R and the R Bioconductor suite, but this is not required. The submodules are organized as follows:

  • Submodule 1: Introduction to Biomarkers - Define what biomarkers are, identify the types of biomarkers, define properties of biomarkers that make them clinically useful, explore case studies of common clinical biomarkers.
  • Submodule 2: Introduction to R Data Structures - (Optional).
  • Submodule 3: Introduction to Linear Models - (Optional).
  • Submodule 4: Principles of Exploratory Analysis - (Optional).
  • Submodule 5: Rat Renal Ischaemia Reperfusion Injury Case Study - Introduce the mouse renal IRI model used in this module.
  • Submodule 6: Linear and Logistic Regression for Comparison of Quantitative Biomarkers - Compare two known clinical biomarkers using linear regression to identify state changes, Compare two biomarkers using binary classification schemes using logistic regression, evaluate classification schemes using ROC curves.
  • Submodule 7: Exploratory Analysis of Proteomics IRI Data - Normalize proteomics data for further analysis, identify and correct for batch effects in the data, explore trends in the data using dimensionality reduction methods such as principle components analysis, plot proteomics data using heatmaps.
  • Submodule 8: Identification of IRI Biomarkers from Proteomic Data - Perform differential analysis on proteomic data to identify potential biomarkers indicating state changes.
  • Submodule 9: Machine Learning Methods in Biomarker Discovery - Explore basic machine learning methods using the IRI proteomics data.

Submodules 2-4 cover optional background material for learners who need it and may be skipped for those who don't.

Software Requirements

This module employs Jupyter Notebooks running R 4.2 using Bioconductor for bioinformatics data analysis and will employ tidy data principles implemented by the tidyverse package. A basic knowledge of R is expected but not required for completing the module. Submodule 02 will review R data structures that will be particularly relevant in regression analysis. Required R packages will be installed within each submodule. The installation can take several minutes the first time the packages are installed. Key packages used include tidyverse and BioConductor packages such as limma. Prior understanding of these packages is not required to complete this module but users are encouraged to learn more about these packages prior to or following completion of this module to better understand the commands used.

Jupyter Notebooks are run through your browser and have the file extension ipynb. Activate the notebook by double-clicking the file name and it will automatically open in your browser. Each notebook consists of markdown and code cells. Markdown cells are for text and figures and are there to guide you through the chapters. Code cells can be run by clicking the play arrow at the top of the screen or by hitting CTRL-ENTER. The code will run within the notebook and generate the appropriate output. You may freely change the code and re-run the block as often as you like. This is useful if you want to test different analysis models or modify figures.

Data

These tutorials use example sequence data procured from the laboratory of Dr. Nisanne Ghonem at the Department of Biomedical and Pharmaceutical Sciences, College of Pharmacy, University of Rhode Island. The relevant manuscripts can be found here and here.

Funding

This module was funded through an administrative supplement to the Rhode Island IDeA Network of Biomedical Research Excellence (RI-INBRE) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20GM103430 (RI-INBRE).

License for Data

Text and materials are licensed under a Creative Commons CC-BY-NC-SA license. The license allows you to copy, remix and redistribute any of our publicly available materials, under the condition that you attribute the work (details in the license) and do not make profits from it. More information is available here.

Creative commons license

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License