Skip to content

Installation Instructions

Jaci Saunders edited this page Oct 1, 2020 · 7 revisions

These instructions assume that you are running within a Unix operating system.

1. Download this Repository

2. Create a virtual environment

  • Create a virtual environment with conda conda create -f environment.yml
    • Activate the environment: conda activate metatryp2_env

3. Setup postgreSQL to build database from scratch.

  • PostgreSQL must already be installed on your machine. It comes pre-installed on many Linux operating systems, such as Ubuntu.
  • If you do not have postgreSQL installed already, see: Installation guide for postgreSQL
  • Create a new postgreSQL user for METATRYP. User here is called metatryp2_user. We set the password to peptides.
$ createuser --interactive metatryp2_user -P
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
  • Check that the user was created successfully:
    • Start the postgreSQL prompt by running under the admin user postgres (need sudo priveleges).
      • $ sudo -u postgres psql
      • Once in the postgreSQL prompt, lookup all users with \du command
      • Output should resemble the following:
      $ sudo -u postgres psql
      [sudo] password for userX:
      psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
      Type "help" for help.
      
      postgres=# \du
                                            List of roles
         Role name    |                         Attributes                         | Member of
      ----------------+------------------------------------------------------------+-----------
       metatryp2_user | Superuser, Create role, Create DB                          | {}
       postgres       | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
      
  • Create an empty postgreSQL database :
    • While still in the postgreSQL prompt: postgres=# CREATE DATABASE metatryp2_proteins;
    • Check that the database was created successfully: postgres=# \l
    • Should produce similar output to the following:
     postgres=# \l
                                              List of databases
          Name         |  Owner   | Encoding |   Collate   |    Ctype    |     Access privileges
      ---------------------+----------+----------+-------------+-------------+---------------------------
      metatryp2_proteins | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
      postgres            | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
    
    • Assign the user you created priveleges to use this new database: postgres=# GRANT ALL PRIVILEGES ON DATABASE metatryp2_proteins TO metatryp2_user;
    • Check that the database was created successfully: postgres=# \l
    • Should resemble the following:
     postgres=# GRANT ALL PRIVILEGES ON DATABASE metatryp2_proteins TO metatryp2_user;
     GRANT
     postgres=# \l
                                              List of databases
         Name        |  Owner   | Encoding |   Collate   |    Ctype    |      Access privileges
     --------------------+----------+----------+-------------+-------------+-----------------------------
      metatryp2_proteins | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres               +
                         |          |          |             |             | postgres=CTc/postgres      +
                         |          |          |             |             | metatryp2_user=CTc/postgres
      postgres           | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
    
    • Quit the postgreSQL prompt and return to working directory: postgres=# \q
  • Open the file /conf/db_config.py and add in the necessary connection configurations for postgreSQL. If following this tutorial explicitly, it would be the following:
     #Connection Settings for the metatryp database, need to be set for program to run
     DB_NAME="metatryp2_proteins"
     DB_CONN_PREFIX="postgresql"
     DB_HOST="localhost"
     DB_PORT="5432"
     DB_PASS="peptides"
     DB_USER="metatryp2_user"
    

4. Add metatryp schema to the empty database

  • If building a database from the pre-built database with 136 pre-loaded marine microbial genomes, follow instructions at: Tutorial.
  • OR, If building a database from scratch use the following:
         $ pg_restore --dbname=metatryp2_proteins --no-owner db_schema/metatryp_base_schema.backup
    
    • Check that the database successfully imported the schema:
     $ sudo -u postgres psql
     postgres=# \connect metatryp2_proteins
     metatryp2_proteins=# \dt
    
    • Output should resemble the following:
     $ sudo -u postgres psql
     [sudo] password for userX:
     psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
     Type "help" for help.
    
     postgres=# \connect metatryp2_proteins
     You are now connected to database "metatryp2_proteins" as user "postgres".
     metatryp2_proteins=# \dt
                            List of relations
      Schema |                Name                 | Type  |  Owner
     --------+-------------------------------------+-------+---------
      public | digest                              | table | protein
      public | metagenome                          | table | protein
      public | metagenome_annotations              | table | protein
      public | metagenome_sequence                 | table | protein
      public | metagenome_sequence_digest_peptide  | table | protein
      public | metagenome_taxon                    | table | protein
      public | ncbi_taxonomy                       | table | protein
      public | peptide                             | table | protein
      public | protease                            | table | protein
      public | protein                             | table | protein
      public | protein_digest                      | table | protein
      public | protein_digest_peptide              | table | protein
      public | specialized_assembly                | table | protein
      public | specialized_assembly_digest_peptide | table | protein
      public | specialized_assembly_sequence       | table | protein
      public | taxon                               | table | protein
      public | taxon_digest                        | table | protein
      public | taxon_digest_peptide                | table | protein
      public | taxon_protein                       | table | protein
     (19 rows)
    
     metatryp2_proteins=# \q