introduction_MCF10A.Rmd

---
title: "MCF10A Breast Carcinogenome Portal"
output:
  html_document:
  toc: true
  theme: united
---

An interactive portal for retrieving and visualizing data from mammary cell carcinogenome project

### Dataset details

This experiment uses 345 selected chemicals for breast carcinogenicity testing, including 120 breast carcinogens, 114 non-carcinogens, and 68 miscellaneous chemicals (e.g. nuclear receptor ligands, BU SRP chemicals, lung carcinogens). Chemical carcinogenicity and genotoxicity annotations are based on the Carcinogenicity Potency Database ([CPDB](https://toxnet.nlm.nih.gov/cpdb/)), which is the result of tissue-specific long-term animal cancer tests in rodents, or breast carcinogens published from [Rudel et. al., 2007](https://onlinelibrary.wiley.com/doi/10.1002/cncr.22653/pdf). In the CRCGN project, MCF10A (breast epithelial) cells are exposed to each individual chemical for 24 hours and their gene expression is profiled on the L1000 platform. Each chemical is assayed at 3 doses (3 fold dilutions starting from the highest concentration of 100uM, with the exception of selected BUSRP chemicals, which had custom chosen starting concentrations) with triplicate profiles generated for each dose. This experiment is repeated also on MCF10A TP53-/- cells. For each chemical and dose profile, the gene expression of 1000 landmark genes is calculated as a moderated z-score (weighted collapsed z-score of the 3 replicate perturbational profiles with respect to the entire plate).

### Annotation

Tables detailing the chemicals/samples in the HEPG2 carcinogenome dataset and their associated annotation.

### Chemical Explorer

Interactive explorer for a single queried chemical. 

#### Gene Expression

A list of differentially expressed genes for a given chemical of interest.

#### Gene set enrichment

Gene set enrichment scores for a given chemical of interest. Gene sets include the [MSigDB](http://software.broadinstitute.org/gsea/msigdb) collections (Hallmark, C2 reactome pathways), and gene targets of various nuclear receptors ([NURSA](https://www.nursa.org/nursa/transcriptomine/index.jsf)). Enrichment scores were computed based on multiple methods, including gsva, ssGSEA, zscore (from R Bioconductor package, [GSVA](https://bioconductor.org/packages/release/bioc/html/GSVA.html)), and gsproj (custom script).

#### Connectivity

Connectivity scores measure similarity of profiles of the query chemical to profiles in the Connectivity Map (CMap).
Scores are calculated at the level of either CMap Perturbagens or Perturbational Classes.

### Marker Explorer

Interactive explorer for a single queried markers (gene, gene-set, or CMap perturbagen/perturbagen class). 

### Heatmap Explorer

A heatmap visualizer to explore bulk visualization of gene expression, gene set enrichment, and connectivity 
results.

A interactive heatmap using Morpheus is supported for querying gene set enrichment results. For details, see [Morpheus](https://software.broadinstitute.org/morpheus/).

---

Credits: Amy Li, Stefano Monti, David Sherr, Broad Institute CMap team.

Contact us at [ajli@bu.edu](mailto:ajli@bu.edu)

This project is supported by Superfund Research Program at Boston University ([BUSRP](http://www.bu.edu/sph/research/research-landing-page/superfund-research-program-at-boston-university)), [NIH/LINCS](http://www.lincsproject.org), and [Find the Cause Breast Cancer Foundation](http://findthecausebcf.org/).