Skip to content

Latest commit

 

History

History
21 lines (12 loc) · 3.06 KB

4_Prepro.md

File metadata and controls

21 lines (12 loc) · 3.06 KB

This section covers standard preprocessing analysis. This corresponds to peak calling for ATAC-Seq and to transcripts abundance quantification for mRNA-Seq, as well as quality control analysis.

The pipeline was originally developed to analyze ATAC-Seq data and therefore more quality controls are being performed for ATAC-Seq data. An important difference between ATAC-Seq and mRNA-Seq data preprocessing is that reads are aligned for ATAC-Seq data analysis, while an alignment-free method is used for mRNA-Seq. For this reason the preprocessing is much faster for mRNA-Seq data. This also means that less quality controls can be made for mRNA-Seq data. More quality controls options for mRNA-Seq data may be added if needed in future versions of Cactus.

For mRNA-Seq data, transcripts abundances are quantifyied with kallisto and quality controls are made with FastQC and MultiQC.

ATAC-Seq preprocessing steps mostly follow the guidelines (first version) from the Harvard Faculty of Arts and Sciences. With the key steps being: reads merging, trimming, aligning, filtering (low quality, duplicates, mitonchondrial and small contigs), shifting (transposase-shift), and peaks calling (with MACS2), splitting, and filtering (blacklisted regions, input control, specific regions). Quality controls are made via published tools (FastQC, MultiQC, DeepTools for reads profiles and correlation, ChIPseeker for distribution of annotated peaks) and homemade scripts (saturation curve, reads overlap with genomic regions, ...).