This app runs modules from the Picard Tools suite (picard.jar v2.22.2 bundled in resources/) to generate quality-control (QC) statistics from mapped/aligned reads. Specifically, this app (for capture panels):
- Calculates multiple summary statistic metrics for mapped reads (paired or unpaired) using Picard CollectMultipleMetrics.
- Calculates mappings metrics to determine the performance of the capture kit by assessing the coverage across all targets in the kit, using Picard CollectHsMetrics.
and for amplicon panels:
- Calculates mappings metrics to determine the performance of the amplicon kit by assessing the coverage across all targets in the kit, using Picard TargetedPcrMetrics.
and for RNA samples:
- Produces RNA alignment metrics for a SAM or BAM file using Picard [CollectRnaSeqMetrics] (http://broadinstitute.github.io/picard/command-line-overview.html#CollectRnaSeqMetrics)
For more information on the Picard Tools suite see: http://broadinstitute.github.io/picard/
This app is designed to be run on aligned sequencing data, either as a standalone app or as part of a DNAnexus workflow.
The QC metrics calculated by Picard tools and output by this app are informative of the quality of the sequence alignments produced by read mapping software such as BWA and Bowtie2.
The outputs of this app are to be displayed visually using MultiQC, and assessed for inconsistencies accross the alignment summary. This summary contains the per-cycle base distribution, target enrichment and read duplication statistics.
The following files are required for this app to run:
sorted_bam:
A coordinate-sorted mapping file in BAM format (*.bam
). BAM files generated by commonly used mappers such as BWA, BWA-MEM, Bowtie, TopHat, HISAT and NovoAlign are acceptable as input.
fasta_index:
A reference genome sequence index generated by "FASTA indexer (with Picard and Samtools)". Make sure to use the same genome that was used to generate the BAM. This should be provided as a gzipped tar archive file (*.fasta-index.tar.gz
).
enrichment_method A boolean flag which denotes the run is a amplicon or capture based panel.
Hybridisation = True will run CollectMultipleMetrics and CollectHsMetrics.
Hybridisation = False will run TargetedPcrMetrics.
The following files are optional for this app to run:
bedfile: A BED file defining the enriched regions.
refflat_file: A refflat file containing reference transcripts.
All Picard statistics files produced by this app are uploaded to a 'QC' directory in the DNAnexus project or working directory from which the app was called.
Picard CollectMultipleMetrics output files:
*.base_distribution_by_cycle*
- the base distribution per cycle*.alignment_summary*
- a summary of the alignment*.quality_by_cycle*
- the base quality per cycle*.insert_size*
- metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries*.quality_distribution*
- the range of quality scores and the total numbers of bases corresponding to those scores
Picard CollectHsMetrics output files:
*.hsmetrics.tsv
- general statistics about the enrichment process.*.pertarget_coverage.tsv
- the GC content and average coverage of each target in the kit.
Picard TargetedPcrMetrics output files:
*.targetPCRmetrics.txt
- A summary of the performance of the target amplicons*.perTargetCov.txt
- A per-amplicon summary of %GC and coverage.
Picard CollectRnaSeqMetrics output files:
*.RNAmetrics.tsv
- A summary of the metrics describing the distribution of the bases within the transcripts.
Detailed information about the metrics reported by all Picard suites can be found at the following page: https://broadinstitute.github.io/picard/picard-metric-definitions.html
This app downloads the given input files and uses BAM and BED files to create a picard intervals_list file. Depending on the capture/amplicon flag either Picard TargetedPcrMetrics or both Picard CollectMultipleMetrics and Picard CalculateHsMetrics are then called.
All output files are uploaded into the directory 'QC'.