This collection of short scripts forms a pipeline for the detection and extraction of accurate whole genome consensus of Influenza virus from clinical samples, tissue culture or passaged material. The pipeline has been developed for use with Illumina paired-end data.
A key prerequisite for this is a properly formatted blast database. Essentially, a separate database for each of the eight genome segments is required. There is a script to do this automatically following the download of whole influenza genomes from http://www.fludb.org
In the main script the following steps are run automatically:
- Map raw sequence data to host genome (BWA)
- Extract reads that do not map to the host (Samtools)
- Assemble non host reads (Velvet)
- Identify closest match for each genome segment (BLAST)
- Map original data to top reference segments (BWA)
- Call new consensus (vcf2consensus.pl)
- Perform further iterations of steps 5 and 6 to improve new consensus (IterMap)
- Output final genome consensus