-
Notifications
You must be signed in to change notification settings - Fork 13
Joint assembly of synthetic and paired end reads
The key parameter is, as so often, the genome size:
For small genomes (<120Mb, depending on RAM available) you can use the SPAdes assembler to assemble the paired-end reads and add the synthetic long reads as trusted contigs (command line parameter "--trusted-contigs"). This can improve the NGA50 by a factor of 2x-3x.
For larger genomes (>100Mb), we recommend to assemble the paired-end reads using a short-read assembler such as SGA or ABySS first and then to co-assemble the resulting contigs and the synthetic long reads using an overlap assembler. We made good experiences using the Celera assembler. Please find some notes to modify it for synthetic long reads here.
An example for such a co-assembly pipeline is as follows:
Step 1: create paired-contigs using the abyss assembler
abyss-pe name=wg.assembly k=64 in='read1.fastq read2.fastq'
The k-mer length k is an important parameter to tweak, usually values around 50-70 give good results for paired-end reads.
Step 2: Create .frg (Celera representation of a sequencing library) files for each input fastq
fastqToCA -reads synthetic.reads.fq.gz -libraryname lib1 -technology moleculo > synthetic.reads.frg
Step 3: Create a spec file for the Celera assembler by following this example
Step 4: Run the celera assembler
runCA -d assembly.out.dir -p assembly.prefix -s spec.file
© 2015 Illumina, Inc. All rights reserved.