Skip to content

Latest commit

 

History

History
89 lines (79 loc) · 19.5 KB

illumina_data_files.md

File metadata and controls

89 lines (79 loc) · 19.5 KB

:box:info:Note:This page is automatically generated; any edits will be overwritten:

Repository information

Data files generated for the Illumina sequencing platform

This document describes the data files generated for the Illumina sequencing platform. It includes details of files created by current and previous generations of instrument and analysis pipelines, including file types that are no longer created. Common current file types are indicated in the table below.

File naming

The files are named using a convention so that related files share a name prefix. In the table below where a file is described as being related to a "corresponding" file, they share the same prefix. E.g. [prefix].cram and [prefix].cram.crai).

File Common Current Description URL
RunInfo.xml XML file containing information about the run. Created by Illumina software.
runParameters.xml XML file containing information about run parameters and run components. Created by Illumina software.
RunParameters.xml XML file containing information about run parameters and run components. Created by Illumina software.
[prefix].bam BAM file containing sequenced reads, usually aligned to a reference sequence.
[prefix].bam.bai BAM index for the corresponding BAM file.
[prefix].composition.json JSON file containing metadata describing the run(s), position(s), tag index(es) and subset(s) of sequence data in the corresponding BAM or CRAM file.
[prefix].bam_stats Text file containing summary data for the corresponding BAM file. Created with bam_stats https://github.com/cancerit/PCAP-core
[prefix].bamcheck Text file containing summary data for the corresponding BAM file. Created with bamcheck (cira 2012). https://github.com/samtools/samtools
[prefix].bqsr_table Text file containing Base Quality Score Recalibration (BQSR) information. Created by GATK. https://gatk.broadinstitute.org
[prefix].cram CRAM file containing sequenced reads, usually aligned to a reference sequence.
[prefix].cram.crai CRAM index for the corresponding CRAM file.
[prefix].deletions.bed BED file reporting deletions. Created by Tophat2. https://github.com/infphilo/tophat
[prefix].flagstat Text file containing read counts. Created by samtools flagstat. https://github.com/samtools/samtools
[prefix].g.vcf.gz GVCF file containing HaplotypeCaller results. Created by GATK. https://gatk.broadinstitute.org
[prefix].g.vcf.gz.tbi TABIX index for the corresponding GVCF file.
[prefix].insertions.bed BED file reporting insertions. Created by Tophat2. https://github.com/infphilo/tophat
[prefix].junctions.bed BED file reporting splice junctions. Created by Tophat2. https://github.com/infphilo/tophat
[prefix].junctions.tab Tab-delimited text file describing splice junctions. Created by STAR aligner. https://github.com/alexdobin/STAR
[prefix].markdups_metrics.txt Text file containing information about read duplication. Created by bamstreamingmarkduplicates. https://github.com/gt1/biobambam2
[prefix].orig.seqchksum Text file containing a data file format-agnostic sequence checksum for all reads. Created by bamseqcksum. https://github.com/gt1/biobambam2
[prefix].readspergene.tab Tab-delimited text file describing read counts per gene. Created by STAR aligner. https://github.com/alexdobin/STAR
[prefix].seqchksum Text file containing a file format-agnostic sequence checksum for a subset of reads. Created by bamseqcksum. https://github.com/gt1/biobambam2
[prefix].sha512primesums512.seqchksum Text file containing a file format-agnostic sequence checksum for all reads (sha512primesums512 hash). Created by bamseqcksum. https://github.com/gt1/biobambam2
[prefix].spatial_filter.stats Text file containing spatial filtering statistics. Created by bambi spatial_filter. https://github.com/wtsi-npg/bambi
[prefix].substitution_analysis.txt Text file containing a substitution error table. Created by bambi substition_analysis. https://github.com/wtsi-npg/bambi
[prefix].substitution_metrics.txt Text file containing analysis results derived from bambi substitution_analysis output. Created by npg_substitution_metrics.pl. https://github.com/wtsi-npg/npg_qc
[prefix].vcf VCF file containing variant calls for the corresponding CRAM file. Created by bcftools. https://github.com/samtools/
[prefix]_F0x900.stats Text file containing statistics for a filtered subset of reads. Created by samtools stats. https://github.com/samtools/
[prefix]_F0xB00.stats Text file containing statistics for a filtered subset of reads. Created by samtools stats. https://github.com/samtools/
[prefix]_F0xF04_target.stats Text file containing statistics for a filtered subset of reads, targeted at a subset of regions. Created by samtools stats. https://github.com/samtools/
[prefix]_F0xF04_target_autosome.stats Text file containing statistics for a filtered subset of reads, targeted at a subset of regions. Created by samtools stats. https://github.com/samtools/
[prefix]_quality_cycle_caltable.txt Text file containing calibration information for recalibrating Illumina base qualities. Created by pb_calibration. https://github.com/wtsi-npg/pb_calibration
[prefix]_quality_cycle_surv.txt Text file containing calibration information for recalibrating Illumina base qualities. Created by pb_calibration. https://github.com/wtsi-npg/pb_calibration
[prefix]_quality_error.txt Text file containing calibration information for recalibrating Illumina base qualities. Created by pb_calibration. https://github.com/wtsi-npg/pb_calibration
[prefix]_salmon.quant.zip Zip archive containing the results of transcript quantification analysis for the corresponding CRAM file. Created by salmon. https://github.com/COMBINE-lab/salmon
qc/[prefix].adapter.json JSON file containing sequencing adapter match information for the corresponding CRAM file. Created by npg_qc::autoqc::checks::adapter. https://github.com/wtsi-npg/npg_qc
qc/[prefix].alignment_filter_metrics.json JSON file containing statistics for splitting sequence files by alignment for the corresponding CRAM file. Created by npg_qc::autoqc::checks::alignment_filter_metrics. https://github.com/wtsi-npg/npg_qc
qc/[prefix].bam_flagstats.json JSON file containing statistics generated for the corresponding CRAM file by samtools stats. Created by npg_qc::autoqc::checks::bam_flagstats. https://github.com/wtsi-npg/npg_qc
qc/[prefix].gatk_collecthsmetrics.txt Text file containing joint genotyping information for the corresponding CRAM file, for one or more samples pre-called with HaplotypeCaller. Created by Picard CollectHsMetrics. https://gatk.broadinstitute.org
qc/[prefix].gc_bias.json JSON file containing GC bias information for the corresponding BAM file. Created by npg_qc::autoqc::results::gc_bias. https://github.com/wtsi-npg/npg_qc
qc/[prefix].gc_fraction.json JSON file containing GC base composition information for the corresponding CRAM file. Created by npg_qc::autoqc::results::gc_fraction. https://github.com/wtsi-npg/npg_qc
qc/[prefix].genotype.json JSON file containing expected sample identity information for the corresponding CRAM file, determined by genotype. Created by npg_qc::autoqc::results::genotype. https://github.com/wtsi-npg/npg_qc
qc/[prefix].insert_size.json JSON file containing insert size distribution information for the corresponding CRAM file. Created by npg_qc::autoqc::results::insert_size. https://github.com/wtsi-npg/npg_qc
qc/[prefix].pulldown_metrics.json JSON file containing pulldown results for the corresponding CRAM file, obtained from Picard CollectHsMetrics. Created by npg_qc::autoqc::results::pulldown_metrics. https://github.com/wtsi-npg/npg_qc
qc/[prefix].qX_yield.json JSON file containing information on the number of bases at a set of quality values (20, 30, 40) for the corresponding CRAM file. Created by npg_qc::autoqc::checks::qX_yield. https://github.com/wtsi-npg/npg_qc
qc/[prefix].ref_match.json JSON file containing the results of a sample contamination check based on read alignment, for the corresponding CRAM file. Created by npg_qc::autoqc::checks::ref_match. https://github.com/wtsi-npg/npg_qc
qc/[prefix].rna_seqc.json JSON file containing the results obtained from the Broad Institute's RNA-SeQC, for the corresponding CRAM file. Created by npg_qc::autoqc::checks::rna_seqc. https://github.com/wtsi-npg/npg_qc
qc/[prefix].sequence_error.json JSON file containing information on the quality of alignment to a reference sequence for the corresponding CRAM file. Created by npg_qc::autoqc::checks::sequence_error. https://github.com/wtsi-npg/npg_qc
qc/[prefix].sequence_summary.json JSON file containing summary information such as BAM/CRAM header and checksum for the corresponding CRAM file. Created by npg_qc::autoqc::results::sequence_summary. https://github.com/wtsi-npg/npg_qc
qc/[prefix].spatial_filter.json JSON file containing spatial filtering statistics for the corresponding CRAM file. Created by npg_qc::autoqc::checks::spatial_filter. https://github.com/wtsi-npg/npg_qc
qc/[prefix].substitution_metrics.json JSON file containing the results of substitution analysis for the corresponding CRAM file. Created by npg_qc::autoqc::checks::substitution_metrics. https://github.com/wtsi-npg/npg_qc
qc/[prefix].verify_bam_id.json JSON file containing the results of verifyBamID (https://github.com/Griffan/VerifyBamID) for the corresponding CRAM file. Created by npg_qc::autoqc::checks::verify_bam_id. https://github.com/wtsi-npg/npg_qc
qc/[prefix]_F0x900.samtools_stats.json JSON file containing the results of samtools stats for a filtered subset of reads from the corresponding CRAM file. Created by npg_qc::autoqc::checks::samtools_stats. https://github.com/wtsi-npg/npg_qc
qc/[prefix]_F0xB00.samtools_stats.json JSON file containing the results of samtools stats for a filtered subset of reads from the corresponding CRAM file. Created by npg_qc::autoqc::checks::samtools_stats. https://github.com/wtsi-npg/npg_qc
qc/[prefix]_F0xF04_target.samtools_stats.json JSON file containing the results of samtools stats for a filtered subset of reads, targeted at a subset of regions, from the corresponding CRAM file. Created by npg_qc::autoqc::checks::samtools_stats. https://github.com/wtsi-npg/npg_qc
qc/[prefix]_F0xF04_target_autosome.samtools_stats.json JSON file containing the results of samtools stats for a filtered subset of reads, targeted at a subset of regions, from the corresponding CRAM file. Created by npg_qc::autoqc::checks::samtools_stats. https://github.com/wtsi-npg/npg_qc