-
Notifications
You must be signed in to change notification settings - Fork 7
Installation
- Basic requirements
- Getting the code
- Getting the databases
- Testing (optional step)
- Ready!
- Interactive visualization of Recentrifuge results
- Numeric results for downstream applications
Python 3.6 or higher is required, but the minimum recommended version is Python 3.8, with higher versions until Python 3.11 supported. If you need help with the installation and setup of Python, please consult Python Setup and Usage. No modules beyond Python Standard Library ones are used by Recentrifuge with the exception of biopython and, optionally:
- pandas for exporting results to CSV or TSV as extra files or for testing Recentrifuge.
- openpyxl package is also required, additionally, for pandas to export results in Excel format.
- matplotlib and xlrd are needed in addition to the previous packages for comprehensive testing the Recentrifuge package.
Installing Recentrifuge inside a virtual environment is not mandatory but, in some circumstances, it could avoid installation issues. If you need them, you have detailed instructions in preparing a virtualenv for Recentrifuge.
-
Option 0: Install using the conda package.
-
Option 1: Get and install the Recentrifuge PyPI package.
$ pip install recentrifuge
-
Option 2: Clone the Recentrifuge repository on GitHub and get the required (and recommended) dependencies.
- Please see cloning a repository in GitHub for help on cloning the repo. You will need
git
installed on your system; in case it is not, please check installing Git. Typically:
$ git clone https://github.com/khyox/recentrifuge.git
- Install biopython. In case you need help with this installation, have a look at biopython installation instructions. For a typical
pip
installation:
$ pip install biopython
- To test Recentrifuge or to be able to export results to CSV, TSV, or Excel, you will also need some additional packages, which you can install easily with
pip
:
$ pip install numpy openpyxl xlrd matplotlib pandas
- Please see cloning a repository in GitHub for help on cloning the repo. You will need
Should you need help installing pandas or openpyxl, please check pandas installation instructions or openpyxl installation instructions.
In the cloning dir, execute retaxdump
. It will download and unzip the required local databases from NCBI servers under the subdirectory taxdump
. For the importance of keeping the NCBI database updated, please check this. For LMAT plasmids support, please read this.
Recentrifuge development is tested by an automatic continuous integration system (check Recentrifuge's Travis CI page for details). Please see comprehensive instructions about testing and validating your Recentrifuge installation here.
At this point, Recentrifuge is ready to analyze your samples:
- if you have results from Centrifuge, please see running Recentrifuge for Centrifuge,
- in case you have LMAT outputs, your choice is running Recentrifuge for LMAT,
- if you have CLARK results, please enter running Recentrifuge for CLARK,
- in case you have data from Kraken, please check running Recentrifuge for Kraken,
- if you use any other taxonomic classifier, please see running Recentrifuge for a generic classifier.
Just open the HTML file generated by Recentrifuge with any JavaScript-enabled browser. Firefox or Chrome are recommended.
Recentrifuge generates CSV/TSV extra files or an Excel file with various sheets containing diverse statistics and detailed numeric results useful for downstream applications.
Support for Python 3.6 will be dropped very soon. Support for Python 3.7 will be dropped soon after its support end of life in 2023.
Python version under 3.6 is no supported as Recentrifuge uses new syntax features of Python 3.6, like syntax for variable annotations (PEP 526) and formatted string literals (PEP 498). The syntax for type annotations was introduced in Python 3.5 (PEP 484) but it is with Python 3.6 when it has achieved maturity for variable annotations. Powerful tools for static type analysis in Python have evolved along with these standards. The development of Recentrifuge includes checks with pylint and mypy. A code whose aim is to perform a robust comparative metagenomic analysis is a very good candidate for robust coding.
One of the most interesting but still quite unknown features of the LMAT software is its ability to properly classify over 4000 plasmids. These plasmids are assigned a taxonomical id (taxid
) beyond the NCBI system. Recentrifuge offers support for this extended classification but requires the LMAT provided file plasmid.names.txt
located in the same directory as the NCBI nodes information files. This location is controlled by the flag -n/--nodespath
.
If Recentrifuge finds the plasmid.names.txt
file, it will parse the plasmids taxid and name using ad hoc regular expressions in order to present the user with a meaningful name for the very diverse plasmids. Recentrifuge is also doing a check to assure that every plasmid in the file is compatible with the NCBI taxonomy used so that only those passing are added. Further details here.
Before you can follow the step-by-step instructions below, you need pyenv
installed in your system. If you don't have it installed on your computer, please see pyenv installation instructions.
If you already have it on your system, please update pyenv
:
> pyenv update
Get the filtered list of available kernels:
> pyenv install --list | grep " 3\.[891]"
Install the desired Python version (for example, 3.11.1):
> pyenv install 3.11.1
Check versions to be sure that Python 3.11.1 is now available:
> pyenv versions
Create and check the new virtual environment:
> pyenv virtualenv 3.11.1 rcf_3.11.1
> pyenv virtualenvs
Activate the virtual environment and update pip
:
> pyenv activate rcf_3.11.1
> pip install --upgrade pip
Now you can easily proceed to install Recentrifuge (see Option 1 under Getting the code) in your new virtual environment.
If you use Recentrifuge in your research, please consider citing the paper. Thanks!
Martí JM (2019) Recentrifuge: Robust comparative analysis and contamination removal for metagenomics. PLOS Computational Biology 15(4): e1006967. https://doi.org/10.1371/journal.pcbi.1006967