-
Notifications
You must be signed in to change notification settings - Fork 213
About deepTools
For instructions on using deepTools 2.0 or newer, please go here. This page only applies to deepTools 1.5
The main reason why deepTools was started is the simple fact that in 2011 we could not find tools that met all our needs for NGS data analysis. While there were individual tools for separate tasks, we wanted software that would fulfill all of the following criteria:
- efficiently extract reads from BAM files and perform various computations on them
- turn BAM files of aligned reads into bigWig files using different normalization strategies
- make use of multiple processors (speed!)
- generation of highly customizable images (change colours, size, labels, file format etc.)
- enable customized down-stream analyses which requires that every data set that is being produced can be stored by the user
- modular approach - compatibility, flexibility, scalability (i.e. we can add more and more modules making use of established methods)
The flow chart below depicts the different tool modules that are currently available within deepTools (deepTools modules are written in bold red and black font). For more information on a typical analysis pipeline, read the text below and What deepTools can do. . If you the file names in the figure mean nothing to you, please make sure to check our Glossary.
You will find many examples from ChIP-seq analyses in this tutorial, but this does not mean that deepTools is restricted to ChIP-seq data analysis. However, some tools, such as bamFingerprint specifically address ChIP-seq-issues.
Here are slides that we used for teaching at the University of Freiburg that contain more details on the deepTools usage and aims.
As shown in the flow chart above, our work usually begins with one or more FASTQ file(s) of deeply-sequenced samples. After a first quality control using FASTQC, we align the reads to the reference genome, e.g. using bowtie2. We then use deepTools to assess the quality of the aligned reads:
- Correlation between BAM files (bamCorrelate). This is a very basic test to see whether the sequenced and aligned reads meet your expectations. We use this check to assess the reproducibility - either between replicates and/or between different experiments that might have used the same antibody or the same cell type etc. For instance, replicates should correlate better than differently treated samples.
- GC bias check (computeGCbias). Many sequencing protocols require several rounds of PCR-based amplification of the DNA to be sequenced. Unfortunately, most DNA polymerases used for PCR introduce significant GC biases as they prefer to amplify GC-rich templates. Depending on the sample (preparation), the GC bias can vary significantly and we routinely check its extent. In case we need to compare files with different GC biases, we use the correctGCbias module to match the GC bias. See the paper by Benjamini and Speed for many insights into this problem.
- Assessing the ChIP strength. We do this quality control to get a feeling for the signal-to-noise ratio in samples from ChIP-seq experiments. It is based on the insights published by Diaz et al..
Once we're satisfied by the basic quality checks, we normally convert the large BAM files into a leaner data format, typically bigWig. bigWig files have several advantages over BAM files that mainly stem from their significantly decreased size:
- useful for data sharing & storage
- intuitive visualization in Genome Browsers (e.g. IGV)
- more efficient downstream analyses are possible
The deepTools modules bamCompare and bamCoverage do not only allow the simple conversion from BAM to bigWig (or bedGraph for that matter), the main reason why we developed those tools was that we wanted to be able to normalize the read coverages so that we could compare different samples despite differences in sequencing depth, GC biases and so on.
Finally, once all the files have passed our visual inspections, the fun of downstream analyses with heatmapper and profiler can begin!
deepTools consists of a set of modules that can be used independently to work with mapped reads. We have subdivided such tasks into quality controls (QC), normalizations and visualizations.
Here's a concise summary of the tools - if you would like more detailed information about the individual tools and example figures, follow the links in the table.
tool | type | input files | main output file(s) | application |
---|---|---|---|---|
bamCorrelate | QC | 2 or more BAM | clustered heatmap | Pearson or Spearman correlation between read distributions |
bamFingerprint | QC | 2 BAM | 1 diagnostic plot | assess enrichment strength of a ChIP sample |
computeGCbias | QC | 1 BAM | 2 diagnostic plots | calculate the exp. and obs. GC distribution of reads |
correctGCbias | QC | 1 BAM, output from computeGCbias | 1 GC-corrected BAM | obtain a BAM file with reads distributed according to the genome's GC content |
bamCoverage | normalization | BAM | bedGraph or bigWig | obtain the normalized read coverage of a single BAM file |
bamCompare | normalization | 2 BAM | bedGraph or bigWig | normalize 2 BAM files to each other using a mathematical operation of your choice (e.g. log2ratio, difference) |
computeMatrix | visualization | 1 bigWig, 1 BED | zipped file, to be used with heatmapper or profiler | compute the values needed for heatmaps and summary plots |
heatmapper | visualization | computeMatrix output | heatmap of read coverages | visualize the read coverages for genomic regions |
profiler | visualization | computeMatrix output | summary plot ("meta-profile") | visualize the average read coverages over a group of genomic regions |
[read]: https://github.com/fidelram/deepTools/wiki/Glossary#terminology "the DNA piece that was actually sequenced ("read") by the sequencing machine (usually between 30 to 100 bp long, depending on the read-length of the sequencing protocol)" [input]: https://github.com/fidelram/deepTools/wiki/Glossary#terminology "confusing, albeit commonly used name for the 'no-antibody' control sample for ChIP experiments"
deepTools is developed by the Bioinformatics Facility at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg. For troubleshooting, see our FAQ and get in touch: deeptools@googlegroups.com
Wiki Start Page | Code | deepTools Galaxy | FAQ | Glossary | Gallery |