Skip to content

maxibor/mgenottate

Repository files navigation

maxibor/mgenottate

Mgenottate: (Meta) GENOme ANNOTTATion

Takes genomes as an input, compute completion/contamination QC metrics with Busco, dereplicates with dREP, and provides a summary table in the end.

graph LR
    a[genome fasta]--> b[busco quality assesment]
    b --> c[dRep genome ANI dereplication]
    c --> d[MMSeqs2 genome taxonomic_annotation]
    d --> e[Summary table]
Loading

Usage

nextflow run maxibor/mgenottate -profile {conda,docker,singularity} --input genome_sheet.csv --busco_db path/to/busco/db --mmseqs2_db_path path/to/mmseqs/db

Input/output options

Define where the pipeline should find input data and save output data.

Parameter Description Type Default Required Hidden
input Path to comma-separated file containing information about the samples and genomes See below for more infos. string True
outdir The output directory where the results will be saved. You have to use absolute paths to storage on Cloud

An example input file can be found in tests/data/test_samplesheet.csv

It contains 2 columns, the first one being the sample name to which a genome belog, and the second one the path to a genome in fasta file (compressed or not).

Databases

Parameter Description Type Default Required Hidden
busco_db Path to busco database string True
mmseqs2_db_name Name of mmseqs prebuilt database (required if not db path is provided) string
mmseqs2_db_path Path to mmseqs database (required if no db name is provided) string

See MMSeqs2 wiki for valid MMSeqs DB names.

Tools options

Parameter Description Type Default Required Hidden
busco_mode Busco mode
HelpOne of genome, proteins, or transcriptome
string genome
busco_lineage Busco lineage. auto for automatic lineage selection string auto
drep_ani drep secondary clustering ANI threshold number 0.99
mmseqs2_mem Amount of memory for MMSeqs2 (in Gb) string '14G'
mmseqs2_search_type 2 (translated), 3 (nucleotide) or 4 (translated nucleotide backtrace) integer null(auto)

Max job request options

Set the top limit for requested resources for any single job.

Parameter Description Type Default Required Hidden
max_cpus Maximum number of CPUs that can be requested for any single job.
HelpUse to set
an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1 integer 16
True
max_memory Maximum amount of memory that can be requested for any single job.
HelpUse to
set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory
'8.GB'` string 128.GB True
max_time Maximum amount of time that can be requested for any single job.
HelpUse to set
an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time
'2.h'` string 240.h True

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter Description Type Default Required Hidden
help Display help text. boolean True
version Display version and exit. boolean True
publish_dir_mode Method used to save pipeline results to output directory.
HelpThe
Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the
pipeline what method should be used to move these files. See [Nextflow
docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details. string copy True
monochrome_logs Do not use coloured log outputs. boolean True

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published