Skip to content
Jaci Saunders edited this page Sep 19, 2020 · 6 revisions

The following provides an example of how METATRYP v. 1 can be used to identify shared peptides among taxa using a database of marine microorganisms. The tutorial will use a pre-assembled database. More information about the organisms in the database can be found here: Tutorial Database.

The tutorial assumed that you have followed the Installation Instructions, and that you have constructed the tutorial database by following step 4.B. from the installation. The tutorial also assumes that the main directory /metatryp-master is your current working directory.

1. List taxa in the database:

./bin/list_taxons.sh

This will list all the taxa in your database alphabetically. Example output:

Aciduliprofundum_boonei_T469
Alcanivorax_sp_DG881
Algoriphagus_machipongonensis_PR1
Alphaproteobacteria_sp_SAR11_HIMB5
Alteromonas_australica_H17
Alteromonas_macleodii_Black_Sea_11
Alteromonas_macleodii_D7
Alteromonas_macleodii_DE
Alteromonas_macleodii_HOT1A3
Alteromonas_macleodii_MIT1002
...

2. Generate a redundancy table for two taxa:

./bin/generate_redundancy_tables.sh --taxon-ids Alteromonas_macleodii_D7 Alteromonas_macleodii_DE --output-dir example2taxaRedundancy

3. Generate a redundancy table for taxa using a file of taxa ids:

./bin/generate_redundancy_tables.sh --taxon-id-file exampleTaxa.txt --output-dir example2taxaRedundancy

This file should be one taxon name per line in a text file.

4. Query taxon sequences for the presence of a peptide:

A. Basic Query:

This will search for the presence of a peptide in the database using an exact string match of the peptide. (Fast search speed.)

./bin/query_by_sequence.sh --sequence VAAEAVLSMTK

B. "Fuzzy" Query:

Can also use "fuzzy searching" to allow for amino acid mismatches. Use flag --max-distance to adjust the number of substitution sites. Note: fuzzy searching can be quite slow. Based upon a Levenshtein distance calculation of substitutions, not biological amino acid substitution models. Use --sequence to search on a single peptide sequence of --sequence-file to search multiple peptides at once (file should contain one peptide sequence per line.)

./bin/query_by_sequence_distance.sh --sequence VAAEAVLSMTK --max-distance 2

5. Make a clean database:

The METATRYP software searches the database named proteomics.db.sqlite. To start with a clean database, either move the tutorial database file proteomics.db.sqlite to another location, rename it, or delete it. Then initialize a new sqlite database by following step 4.A. in the installation instructions.


Example Database:

The predicted proteome of each taxon in the database was pulled from the Joint Genome Institute's Integrated Microbial Genomes & Microbiomes Database (JGI IMG). For more information on the taxa: database info.

Taxa in Database:

Aciduliprofundum boonei T469
Alcanivorax sp. DG881
Algoriphagus machipongonensis PR1
Alphaproteobacteria sp. SAR11 HIMB5
Alteromonas australica H17
Alteromonas macleodii Black Sea 11
Alteromonas macleodii D7
Alteromonas macleodii DE
Alteromonas macleodii HOT1A3
Alteromonas macleodii MIT1002
Alteromonas macleodii REDSEA-S09_B2
Alteromonas macleodii REDSEA-S12_B5
Alteromonas macleodii REDSEA-S15_B11
Alteromonas mediterranea DE
Alteromonas sp. ALT199
Bacillus sp. B14905
Bacillus sp. NRRL B-14911
Bacillus sp. SG-1
Beggiatoa sp. Orange Guaymas
Brevundimonas sp. BAL3
Caminibacter mediatlanticus TB-2
Candidatus Nitrosopelagicus brevis CN25
Candidatus Pelagibacter ubique SAR11 HTCC1062
Carnobacterium sp. AT7
Citreicella sp. SE45
Congregibacter litoralis KT71
Crocosphaera watsonii WH 0003
Crocosphaera watsonii WH 0005
Crocosphaera watsonii WH 0401 (draft1)
Crocosphaera watsonii WH 0402
Crocosphaera watsonii WH 8501
Crocosphaera watsonii WH 8502
Cyanobium sp. PCC 7001
Erythrobacter litoralis HTCC2594
Erythrobacter sp. NAP1
Erythrobacter sp. SD-21
Flavobacteria bacterium BAL38
Flavobacteria bacterium BBFL7
Fulvimarina pelagi HTCC2506
Gammaproteobacteria sp. OM60 HIMB55
Hydrogenivirga sp. 128-5-R1-1
Idiomarina baltica OS145
Janibacter sp. HTCC2649
Kordia algicida OT-1
Lentisphaera araneosa HTCC2155
Limnobacter sp. MED105
Loktanella sp. SE62
Loktanella vestfoldensis SKA53
marine actinobacterium PHSC20C1
Marinitoga piezophila KA3
Microbulbifer agarilyticus S89
Nitrobacter hamburgensis X14
Nitrococcus mobilis Nb-231
Nitrosopelagicus brevis 1
Nitrosopelagicus sp. REDSEA-S32_B2
Nitrosopumilus maritimus SCM1
Nitrospina sp. AB-629-B18
Nitrospina sp. SCGC AAA799-A02
Nitrospina sp. SCGC AAA799-C22
Nitrospira defluvii
Planctomycetaceae bacterium KSU-1
Prochlorococcus marinus MIT9312
Prochlorococcus sp. MIT9201
Prochlorococcus sp. MIT9202
Prochlorococcus sp. MIT9211
Prochlorococcus sp. MIT9215
Prochlorococcus sp. MIT9301
Prochlorococcus sp. MIT9302
Prochlorococcus sp. MIT9303
Prochlorococcus sp. MIT9311
Prochlorococcus sp. MIT9314
Prochlorococcus sp. MIT9321
Prochlorococcus sp. MIT9322
Prochlorococcus sp. MIT9401
Prochlorococcus sp. MIT9515
Prochlorococcus sp. NATL1A
Prochlorococcus sp. NATL2A
Prochlorococcus sp. SB
Prochlorococcus sp. SS35
Prochlorococcus sp. SS51
Prochlorococcus sp. SS52
Pseudoalteromonas arctica A 37-1-2
Pseudoalteromonas atlantica T6c
Pseudoalteromonas atlantica TB41
Pseudoalteromonas denitrificans DSM 6059
Pseudoalteromonas piscicida ATCC 15057
Pseudoalteromonas rubra ATCC 29570
Pseudoalteromonas rubra S2471
Richelia intracellularis RC01
Roseobacter litoralis Och 149
Roseobacter sp. AzwK-3b
Roseobacter sp. CCS2
Roseobacter sp. GAI101
Roseobacter sp. MED193
Roseobacter sp. SK209-2-6
Roseovarius nubinhibens ISM
Roseovarius sp. TM1035
Sagittula stellata E-37
SAR116 cluster alpha proteobacterium sp. HIMB100
SAR116 cluster alphaproteobacterium REDSEA-S10_B10N8
SAR116 cluster alphaproteobacterium REDSEA-S2_B12
Shewanella benthica KT99
Sphingomonas sp. SKA58
Stenotrophomonas sp. SKA14
Sulfitobacter sp. EE-36
Sulfitobacter sp. NAS-14.1
Sulfurospirillum sp. Am-N (draft)
Synechococcus elongatus PCC 6301
Synechococcus elongatus PCC 7942
Synechococcus sp. 7002
Synechococcus sp. BL107
Synechococcus sp. CB0101
Synechococcus sp. CB0205
Synechococcus sp. CC9311
Synechococcus sp. CC9605
Synechococcus sp. CC9616
Synechococcus sp. CC9902
Synechococcus sp. PCC 7336
Synechococcus sp. RS9916
Synechococcus sp. RS9917
Synechococcus sp. WH 8016
Synechococcus sp. WH 8109
Synechococcus sp. WH5701
Synechococcus sp. WH7803
Synechococcus sp. WH7805
Synechococcus sp. WH8102
Thalassobium sp. R2A62
Thermococcus sp. AM4
Trichodesmium erythraeum IMS101
Trichodesmium thiebautii H9-4
Vibrio alginolyticus 12G01
Vibrio campbellii AND4
Vibrio fischeri MJ11
Vibrio parahaemolyticus 16
Vibrio sp. MED222
Vibrio splendidus 12B01