Mgenottate: (Meta) GENOme ANNOTTATion
Takes genomes as an input, compute completion/contamination QC metrics with Busco, dereplicates with dREP, and provides a summary table in the end.
graph LR
a[genome fasta]--> b[busco quality assesment]
b --> c[dRep genome ANI dereplication]
c --> d[MMSeqs2 genome taxonomic_annotation]
d --> e[Summary table]
nextflow run maxibor/mgenottate -profile {conda,docker,singularity} --input genome_sheet.csv --busco_db path/to/busco/db --mmseqs2_db_path path/to/mmseqs/db
Define where the pipeline should find input data and save output data.
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
input |
Path to comma-separated file containing information about the samples and genomes See below for more infos. | string |
True | ||
outdir |
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud |
An example input file can be found in tests/data/test_samplesheet.csv
It contains 2 columns, the first one being the sample name to which a genome belog, and the second one the path to a genome in fasta file (compressed or not).
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
busco_db |
Path to busco database | string |
True | ||
mmseqs2_db_name |
Name of mmseqs prebuilt database (required if not db path is provided) | string |
|||
mmseqs2_db_path |
Path to mmseqs database (required if no db name is provided) | string |
See MMSeqs2 wiki for valid MMSeqs DB names.
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
busco_mode |
Busco mode HelpOne of genome, proteins, or transcriptome |
||||
string |
genome | ||||
busco_lineage |
Busco lineage. auto for automatic lineage selection | string |
auto | ||
drep_ani |
drep secondary clustering ANI threshold | number |
0.99 | ||
mmseqs2_mem |
Amount of memory for MMSeqs2 (in Gb) | string |
'14G' | ||
mmseqs2_search_type |
2 (translated), 3 (nucleotide) or 4 (translated nucleotide backtrace) | integer |
null(auto) |
Set the top limit for requested resources for any single job.
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
max_cpus |
Maximum number of CPUs that can be requested for any single job. HelpUse to set |
||||
an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1 |
integer |
16 | |||
True | |||||
max_memory |
Maximum amount of memory that can be requested for any single job. HelpUse to |
||||
set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory | |||||
'8.GB'` | string |
128.GB | True | ||
max_time |
Maximum amount of time that can be requested for any single job. HelpUse to set |
||||
an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time | |||||
'2.h'` | string |
240.h | True |
Less common options for the pipeline, typically set in a config file.
Parameter | Description | Type | Default | Required | Hidden |
---|---|---|---|---|---|
help |
Display help text. | boolean |
True | ||
version |
Display version and exit. | boolean |
True | ||
publish_dir_mode |
Method used to save pipeline results to output directory. HelpThe |
||||
Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the |
|||||
pipeline what method should be used to move these files. See [Nextflow | |||||
docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details. | string |
copy | True | ||
monochrome_logs |
Do not use coloured log outputs. | boolean |
True |