-
Notifications
You must be signed in to change notification settings - Fork 7
Recentrifuge command line
Jose Manuel Martí edited this page Sep 30, 2024
·
11 revisions
The layout of the Recentrifuge (rcf
) command (ver. 1.15.0) is:
usage: rcf [-h] [-V] [-n PATH] [--format GENERIC_FORMAT]
(-f FILE | -g FILE | -l FILE | -r FILE | -k FILE) [-o FILE]
[-e OUTPUT_TYPE] [-p] [--nohtml] [-a | -c CONTROLS_NUMBER]
[-s SCORING] [-y NUMBER] [-m INT] [-x TAXID] [-i TAXID]
[-z NUMBER] [-w INT] [-u SUMMARY_BEHAVIOR] [-t]
[--nokollapse] [-d] [--strain] [--sequential]
Define Recentrifuge input files and formats
-n PATH, --nodespath PATH
path for the nodes information files (nodes.dmp and
names.dmp from NCBI)
--format GENERIC_FORMAT
format of the output files from a generic classifier
included with the option -g. It is a string like
"TYP:csv,TID:1,LEN:3,SCO:6,UNC:0" where valid file
TYPes are csv/tsv/ssv, and the rest of fields indicate
the number of column used (starting in 1) for the
TaxIDs assigned, the LENgth of the read, the SCOre
given to the assignment, and the taxid code used for
UNClassified reads
-f FILE, --file FILE Centrifuge output files; if a single directory is
entered, every .out file inside will be taken as a
different sample; multiple -f is available to include
several Centrifuge samples
-g FILE, --generic FILE
output file from a generic classifier; it requires the
flag --format (see such option for details); if a single
directory is entered, every file inside will be taken as a
different sample; multiple -g is available to include
several generic samples by filename
-l FILE, --lmat FILE LMAT output dir or file prefix; if just "." is
entered, every subdirectory under the current
directory will be taken as a sample and scanned
looking for LMAT output files; multiple -l is
available to include several samples.
-r FILE, --clark FILE
CLARK full-mode output files; if a single directory is
entered, every .csv file inside will be taken as a
different sample; multiple -r is available to include
several CLARK, CLARK-l, and CLARK-S full-mode samples.
-k FILE, --kraken FILE
Kraken output files; if a single directory is entered,
every .krk file inside will be taken as a different
sample; multiple -k is available to include several
Kraken (version 1 or 2) samples.
Related to the Recentrifuge output files
-o FILE, --outprefix FILE
output prefix; if not given, it will be inferred from
input files; an HTML filename is still accepted for
backwards compatibility with legacy --outhtml option
-e OUTPUT_TYPE, --extra OUTPUT_TYPE
type of extra output to be generated, and can be one
of ['FULL', 'CSV', 'MULTICSV', 'TSV', 'DYNOMICS']
-p, --pickle pickle (serialize) statistics and data results in
pandas DataFrames (format affected by selection of
--extra)
--nohtml suppress saving the HTML output file
Coarse tuning of algorithm parameters
-a, --avoidcross avoid cross analysis
-c CONTROLS_NUMBER, --controls CONTROLS_NUMBER
this number of first samples will be treated as
negative controls; default is no controls
-s SCORING, --scoring SCORING
type of scoring to be applied, and can be one of
['SHEL', 'LENGTH', 'LOGLENGTH', 'NORMA', 'LMAT',
'CLARK_C', 'CLARK_G', 'KRAKEN', 'GENERIC']
-y NUMBER, --minscore NUMBER
minimum score/confidence of the classification of a
read to pass the quality filter; all pass by default
-m INT, --mintaxa INT
minimum taxa to avoid collapsing one level into the
parent (if not specified a value will be automatically
assigned)
-x TAXID, --exclude TAXID
NCBI taxid code to exclude a taxon and all underneath
(multiple -x is available to exclude several taxid)
-i TAXID, --include TAXID
NCBI taxid code to include a taxon and all underneath
(multiple -i is available to include several taxid);
by default, all the taxa are considered for inclusion
Fine tuning of algorithm parameters
-z NUMBER, --ctrlminscore NUMBER
minimum score/confidence of the classification of a
read in control samples to pass the quality filter; it
defaults to "minscore"
-w INT, --ctrlmintaxa INT
minimum taxa to avoid collapsing one level into the
parent (if not specified a value will be automatically
assigned)
-u SUMMARY_BEHAVIOR, --summary SUMMARY_BEHAVIOR
choice for summary behaviour, and can be one of
['ADD', 'ONLY', 'AVOID']
-t, --takeoutroot remove counts directly assigned to the "root" level
--nokollapse show the "cellular organisms" taxon
Advanced modes of running
-d, --debug increase output verbosity and perform additional
checks (default: False)
--sequential deactivate parallel processing (default: False)
--strain set strain level instead of species as the resolution
limit for the robust contamination removal algorithm;
use with caution, this is an EXPERIMENTAL feature
Other useful arguments
-h, --help show the help message and exit
-V, --version show program's version number and exit
If you use Recentrifuge in your research, please consider citing the paper. Thanks!
Martí JM (2019) Recentrifuge: Robust comparative analysis and contamination removal for metagenomics. PLOS Computational Biology 15(4): e1006967. https://doi.org/10.1371/journal.pcbi.1006967