-
Notifications
You must be signed in to change notification settings - Fork 136
Assembly Tips
MEGAHIT uses multiple k-mer strategy. Minimum k, maximum k and the step for iteration can be set by options --k-min
, --k-max
and --k-step
respectively. k must be odd numbers while the step must be an even number.
- for ultra complex metagenomics data such as soil, a larger kmin, say 27, is recommended to reduce the complexity of the de Bruijn graph. Quality trimming is also recommended
- for high-depth generic data, large
--k-min
(25 to 31) is recommended - smaller
--k-step
, say 10, is more friendly to low-coverage datasets
(kmin+1)-mer with multiplicity lower than d (default 2, specified by --min-count
option) will be discarded. You should be cautious to set d less than 2, which will lead to a much larger and noisy graph. We recommend using the default value 2 for metagenomics assembly. If you want to use MEGAHIT to do generic assemblies, please change this value according to the sequencing depth. (recommend --min-count 3
for >40x).
This is specially designed for metagenomics assembly to recover low coverage sequence. For generic dataset >= 30x, MEGAHIT may generate better results with --no-mercy
option.
This mode can be activated by option --kmin-1pass
. It is more memory efficient for ultra low-depth datasets, such as soil metagenomics data.