Skip to content

7 SourceCode Installation

Christopher Neely edited this page Sep 6, 2021 · 18 revisions

This page describes the build process for the deprecated Docker source code version.

Users should check out the Conda installation instructions for more information about downloading MetaSanity to their system.


MetaSanity is available for download as a source code version. The source code works with existing installations of the program dependencies, and thus is most useful for users working on systems on which most or all are already installed.

Installation - Source Code

Install MetaSanity by using its install Python3 script. Download the script from this repository.

wget https://raw.githubusercontent.com/cjneely10/MetaSanity/master/install.py -O install.py

Note Users are strongly advised to use the default version of this software. However, users who wish to use the source code may install MetaSanity using python3 install.py -s sourcecode_installation.

Download script

MetaSanity relies on a series of databases to compute its evaluation and annotation pipelines.

The script download-data.py in the program package is available to easily download all needed database files into the MetaSanity installation. Run the script to download all required databases. For example, python3 download-data.py. This will download and extract ~46GB of database files into the MetaSanity directory.

usage: download-data.py [-h] [-d DATA]

Download required MetaSanity data
Select from: gtdbtk,checkm,kofamscan,peptidase,virsorter

optional arguments:
  -h, --help            show this help message and exit
  -d DATA, --data DATA  Comma-separated list (no spaces) of databases to download, default all

Users may elect only to download a subset of these databases. For example, python3 download-data.py -d kofamscan,peptidase,virsorter will only download databases that are used in the FuncSanity pipeline. Similarly, python3 download-data.py -d gtdbtk,checkm will only download databases that are used in the PhyloSanity pipeline. Database availability is subject to the discretion of each program's hosting institution.

Linking install to downloads

The install.py script configures the MetaSanity installation based on download location and installation type. Variables in MetaSanity.py are updated to reflect these values.

Confirm that the variable DOWNLOAD_DIRECTORY on line 29 in the MetaSanity.py script is the correct location of the program installation. For example, DOWNLOAD_DIRECTORY = "/correct/path/to/MetaSanity".

Users who have database files stored in directories not generated by the download-data.py script, or who have external BioMetaDB installations, should update these values as needed in MetaSanity.py. If the download-data.py script was used to download data to a directory other than the default MetaSanity values, update the databases directory location on line 50.

Source Code Installation

  • Python ≥ 3.5
    • sudo apt-get -y install python3.5 && sudo apt-get -y install python3-venv python3-pip && pip3 install --upgrade pip
  • Python 2
    • sudo apt-get -y install python2.7 && sudo apt-get -y install python-pip && pip install --upgrade pip
  • Python packages
    • Cython
    • argparse
    • configparser
    • luigi
    • pandas
    • BioMetaDB
    • sudo pip3 install Cython argparse configparser luigi pandas
  • External programs and their dependencies
    • PhyloSanity
      • CheckM
        • HMMER, prodigal, and pplacer are required to be on the user's system path. Uses Python2.7 to run.
        • Use checkm data setRoot on precalulated data files, available through the CheckM website.
        • pip2.7 install numpy scipy checkm-genome
        • Given the python requirement, this program is best contained within a separate python environment.
      • GTDBtk
        • Prodigal, HMMER, pplacer, FastANI, FastTree are required to be on the system path. Uses Python2.7 to run.
        • From website - "GTDB-Tk requires an environmental variable named GTDBTK_DATA_PATH to be set to the directory containing the data downloaded from https://data.ace.uq.edu.au/public/gtdbtk/."
        • pip2.7 install dendropy future matplotlib numpy scipy gtdbtk
        • Given the python requirement, this program is best contained within a separate python environment.
      • FastANI
        • Clone the program from Github and follow the directions within its INSTALL.txt file to compile.
    • FuncSanity
      • diamond
        • Download and extract the binary from the Github page
      • Prodigal
        • sudo apt-get install prodigal
      • kofamscan
        • This program requires database files and program binaries to be configured. This github, along with the program's README files, explains how to configure kofamscan.
      • BioData/KEGGDecoder
        • Clone or download the BioData repository.
      • PROKKA
        • sudo apt-get install libdatetime-perl libxml-simple-perl libdigest-md5-perl git default-jre bioperl
        • sudo cpan Bio::Perl
        • git clone https://github.com/tseemann/prokka.git $HOME/prokka
          • Can install in a different place than $HOME if desired.
        • $HOME/prokka/bin/prokka --setupdb
        • RNAmmer may optionally be installed
      • VirSorter
        • MetaSanity uses the Docker v1.0.5 version of virsorter.
        • Download and extract the required databases that are listed on the program's website.
        • Run docker pull simroux:virsorter:v1.0.5.
      • psortb
        • The commandline version of psortb is used in MetaSanity.
        • sudo docker pull brinkmanlab/psortb_commandline:1.0.2
        • Download the run script and change its file permissions
        • wget https://raw.githubusercontent.com/brinkmanlab/psortb_commandline_docker/master/psortb && chmod +x psortb
      • hmmer
        • sudo apt-get install hmmer
      • Required Data (automatically downloaded in the download-data.py script):

Program installation

Important note for Source Code usage:

MetaSanity uses config files to run its analysis. Within the config file for your run, change the values of the PATH variables to reflect your system's installation. Otherwise, using the SourceCode is identical to using the Docker installation. See the full example for more information.

Prior to first run

Prior to running your MetaSanity.py script, ensure ALL of the following:

  • The DOWNLOAD_DIRECTORY is the correct location of the MetaSanity directory (line 29).
  • The VERSION is v1.2.0 (line 31).

If you are using any of the recommended programs listed above make sure that their paths are correct (lines 38-42)

If you installed databases in a separate location than in the MetaSanity installation directory, ensure that lines 50-60 are valid.

Ensure that line 34 is the location of the pipedm.py script.

Your installation is complete! See the full example to test your installation and to learn how to use MetaSanity! Learn more about PhyloSanity and FuncSanity on their wiki pages!