Skip to content

When to use each of the SqueezeMeta modes

Javier Tamames edited this page Sep 29, 2020 · 9 revisions

SqueezeMeta can be run in four modes: sequential, coassembly, merged and seqmerge. Sequential mode treats separately each sample (specified in the samples file). Therefore, binning is not available with this mode, since the installed binning methods make use of the differential contig abundance in different samples. Coassembly mode creates a pool with all the reads from all the samples and assemble them together. Merged and seqmerge modes assemble each sample separately, and then merge the resulting contigs to get a single coassembly.

When to use each of the four methods?

Sequential mode if useful when you have just one sample, do not intend to compare results between samples, or have disparate samples where coassembly is not possible. The performance of this mode is usually lower than the other methods (see the SqueezeMeta paper in Frontiers for an example). Binning will often require additional processing (de-replication) to recognize the instances of the same bin in different samples. See for instance this paper.

Coassembly mode is the recommended method when having more than one sample. Working simultaneously with all reads makes the assembly step to improve, and you will likely have nice contigs and nice bins. But often coassembly is computationally demanding. If you have several (big) samples, or not extensive computational resources, it is possible that the coassembly fails.

Merged mode can be used in these settings. It will probably provide less perfect results (for instance higher disparity values in contigs and bins, see again the SqueezeMeta paper), but it will have your work done. Sometimes this is the only way of being able to analyze jointly a big set of metagenomes.

Seqmerge mode follows the same approach than the merged mode (assembly individual metagenomes first, then merge them), but instead of merging all metagenomes together, it proceeds step-wise, merging in pairs the two closest metagenomes. This lightens the computational requirements for the minimus2 algorithm, and can succeed in cases where merged mode would crash.

A final advice: when the run is finished, take a look at the mappingstats file. There you have the important information of how much each of the samples are represented in the assembly. If you see that the mapping percentages are low (let´s say below 50%), then you must be aware that most of your metagenomes are not actually represented in the analysis, because most of the reads failed to assemble. This can happen if the microbiome is very diverse, or if your sequencing depth is not deep enough. Or both. In that case, one possibility is to make a paralell analysis using the raw reads (see our latest paper comparing methods in bioRxiv). You can use the SqueezeMeta for reads for this task (available from version 0.5.0)