Skip to content

Joint assembly of synthetic and paired end reads

olest edited this page Apr 20, 2015 · 9 revisions

The key parameter is, as so often, the genome size:

For small genomes (<120Mb, depending on RAM available) you can use the SPAdes assembler to assemble the paired-end reads and add the synthetic long reads as trusted contigs (command line parameter "--trusted-contigs"). This can improve the NGA50 by a factor of 2x-3x.

For larger genomes (>100Mb), we recommend to assemble the paired-end reads using a short-read assembler such as SGA or ABySS first and then to co-assemble the resulting contigs and the synthetic long reads using an overlap assembler. We made good experiences using the Celera assembler. Please find some notes to modify it for synthetic long reads here.

An example for such a co-assembly pipeline is as follows:

Step 1: create paired-contigs using the abyss assembler

abyss-pe name=wg.assembly k=64 in='read1.fastq read2.fastq'

The k-mer length k is an important parameter to tweak, usually values around 50-70 give good results for paired-end reads.

Step 2: Create .frg (Celera representation of a sequencing library) files for each input fastq

fastqToCA -reads synthetic.reads.fq.gz -libraryname lib1 -technology moleculo > synthetic.reads.frg

Step 3: Create a spec file for the Celera assembler by following this example

Step 4: Run the celera assembler

runCA -d assembly.out.dir -p assembly.prefix -s spec.file

© 2015 Illumina, Inc. All rights reserved.