Skip to content
fstrozzi edited this page Apr 27, 2012 · 24 revisions

Intro

Bio::Faster is a BioRuby gem that implements a fast and simple parser for FastQ file. The new version dropped the support for simple FastA files to focus on the more resource demanding FastQ parsing. This new version is a complete rewrite of the old one, the C extension has been completely written from scratch and now the parser checks also for formatting problems in FastQ files. Full RSpecs has been defined based on the test files available in the official FastQ paper.

Usage

The Bio::Faster class is instantiated with the file name and the each_record method is then used to parse the whole file. It returns an array with the sequence header (ID and comment), the sequence itself and an array with the quality values. Default quality encoding is expected to be Sanger (Phred33).

fastq = Bio::Faster.new("sequences.fastq")
fastq.each_record do |sequence_header, sequence, quality|
     puts sequence_header, sequence, quality
end

If the quality encoding is Phred64 (i.e. Solexa) you need to specify it

fastq_solexa = Bio::Faster.new("sequences.fastq",:solexa)

The method each_record can also read directly from STDIN and this can be useful when dealing with compressed FastQ files.

Just specify :stdin as the input:

Bio::Faster.new(:stdin).each_record do |seq|
...

and you can call the Ruby script with pipes in a standard Unix terminal:

zcat sequences.fastq.gz | ruby my_parser.rb

So you can read gzipped files without any drop in the parser performance.

Performance

This is a comparison of the time needed to parse a 5.4 Gb FastQ file.

Using BioFaster:

Bio::Faster.new("test_file.fastq").each_record {|sequence_header, sequence, quality|}
real	3m55.870s
user	3m51.767s
sys	0m4.055s

Using standard BioRuby parser:

Bio::FlatFile.open(Bio::Fastq,File.open("test_file.fastq")).each_entry {|seq|}
real	11m35.946s
user	11m26.762s
sys	0m7.764s

BioFaster is almost 4X times faster then standard object oriented FastQ parser method.

Clone this wiki locally