FASTQ

FASTQ is a text-based format for storing biological sequences and their corresponding quality scores.

Have a look at one of the FASTQ files for the workshop:

zcat ~ngs00/data/mouse_cns_E18_rep1_1.fastq.gz | head -4

@HWI-ST985:73:C08BWACXX:8:1101:1920:2006 1:N:0: (1)
NTGCTCGGCCTCTTTCAGCTGTTTCTGCAGCTGCTGAATATCACTGTCTCTCTTCTCTACTTCTTTCTCTAAAGCCTGCATTTCGTGGTGAACTTTTCCCT (2)
+ (3)
#1=DDFFFHHHHHJJHIIJJJJJJJJJJJIIJGIJJJGIFIGEIGIGHIIIJJIJJIJIJIJJJJIJHJJHHIIIHHHGHHDFBCDCDBCCDDDDDDDDDD (4)

sequence id - begins with the @ character and is followed by a sequence identifier and an optional description
raw sequence
begins with the + character and is optionally followed by the same sequence identifier (and any description) again
quality - encodes the quality values for the sequence and must contain the same number of symbols

Note	The fourth line can also begin with `@` depending on the quality encoding (see below)

Quality encoding

A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect). The most used formula is the Phred quality score:

\$Q_(phred) = -10log_10(p)\$

offset	max Phred score range	max ASCII range	real-world Phred score range	real-world ASCII range
33	0 - 93	33 - 126	0 - 40	33 - 73
64	0 - 62	64 - 126	0 - 40	64 - 104

FASTQ quality encoding on Wikipedia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastq-format.adoc

fastq-format.adoc

FASTQ

Quality encoding

Files

fastq-format.adoc

Latest commit

History

fastq-format.adoc

File metadata and controls

FASTQ

Quality encoding