Release MMseqs2 Release 8-fac81 · soedinglab/MMseqs2

At a glance: Faster searches and clustering through improved IO and better seeding. More search modes like tblastx, reciprocal best hit and linsearch. New output format SAM. Support for compressed databases to reduce hard disk and memory requirements.

Known Issues

Iterative search only works up to 2 iterations

Breaking Changes

MMseqs2 now saves a lot on IO by not merging result datafiles
There is still a single .index file, but the corresponding data files are split into multiple parts (as many as threads were used previously)
MMseqs2 now uses the VTML80 [1,2] substitution matrix to speed up the prefiltering (changeable by --seed-sub-mat), the final alignment is still computed with the Blosum62 (still changeable by --sub-mat)
All databases have now a .dbtype file
MMseqs2 Docker image is now based on Debian instead of Alpine
Changed Orf header format to be more space efficent. The new format is now orignIdentifer startPos(-/+)len flag
prefilter returns ungapped-alignment scores instead of e-values
createindex the file extention is now .idx instead of the previous .[s]k[6,7] format

Features

Support for tblastx-style nucl-nucl translated searches
mmseqs search nuclDB1 nuclDb2 aln tmp --search-mode 2
Support for nucleotide searches
mmseqs search nuclDB1 nuclDb2 aln tmp --search-mode 3
convertalis has learned to return SAM formatted output (preview)
Database can be compressed by applying zstd on each entry (--compressed 1)
- Also added compress and decompress modules
rbh workflow for reciprocal best hit searches added
linclust can now cluster nucleotide sequences on both forward and reverse strand
Added linsearch, a lightning fast search for proteins and nucleotide sequences (preview; easy workflow variant easy-linsearch also added)
createlinindex computes an index for linsearch
taxonomy uses --orf-start-mode 1 to annotate more sequences
Added approx. 2bLCA to speed up computation, this is now the new default. The old mode can be turned on by --lca-mode 2
createdb recognizes sequences containing Uracil as DNA sequences
createdb is now faster through speeding up its shuffle operations
view module to view single entry in an MMseqs2 database
align module has learned --min-aln-len parameter to filter by minimal alignment length
Alignment modules (rescorediagonal, align) can align longer sequences now (not limited to 2^15 length)
Input sequences can now be softmasked (lower letter masking) instead of only hard masking (replacing with X) ``--mask-lower-case. The masking only applies to the prefilter stages kmermatcher` or `prefilter` and can be combined with `--mask`
filterdb has learned --filter-expression parameter and mode that allows filtering by simple mathematical expressions
alignbykmer can be used for nucleotide searches
MMseqs2 did-you-mean functionality gives better suggestions
MMseqs2 does not repeat the whole parameter list for each submodule call anymore

Bugs

Default parameters of map workflow are now set correctly
Some modules were using the wrong coverage parameter
Sliced profile search was losing high E-value hits
Sliced profile search is now stable
Profile-Sequence alignment E-values where slightly too high
result2msa was crashing with profiles on the target side
result2msa should not crash with --alow-deletion anymore
Some parameters were never visible (with or without -h)
Various issues with MPI were resolved

Developers

Continous integration enforces no compile warnings now
Continous integration now tries to build AArch64 builds with Docker and Qemu
We added a first draft of our developer guide to the wiki

References

[1] Müller T & Martin Vingron, Modeling Amino Acid Replacement, J Comput Biol. 2000;7:761–76. doi: 10.1089/10665270050514918.

[2] Müller T, Spang R, Vingron M. Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol. 2002;19:8–13. doi: 10.1093/oxfordjournals.molbev.a003985

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMseqs2 Release 8-fac81