Skip to content
Jaci Saunders edited this page May 2, 2020 · 6 revisions

1: Digest and ingest data

  1. Run the script bin/digest_and_ingest.sh with FASTA proteome files you wish to digest and ingest. e.g.:
bin/digest_and_ingest.sh file1.fasta file2.fasta ...

This script reads the FASTA files, and runs digestions on their sequences. You should see a fair amount of output as these files are processed.

2: Generate redundancy tables

  1. See available taxon ids by querying DB: e.g.
bin/list_taxon_ids.sh
  1. Generate redundancy tables for groups of taxons e.g.
bin/generate_redundancy_tables.sh --taxon-ids syn8102 syn7502 syn7503 --output-dir exampleRedundancyTables

Note that you can also specify a file that contains a list of taxon IDs, e.g

bin/generate_redundancy_tables.sh --taxon-id-file taxon_id_list.txt --output-dir exampleRedundancyTables
  1. View resulting files in exampleRedundancyTables
    • redundancy.db.sqlite is generated with the redundancy information
    • counts.csv contains counts of redundant peptides
    • percents.csv contains the values in counts.csv, divided by the number of unique peptides in the union of digestions of a taxa pair.

3: Remove Taxa from the Database

If you wish to delete data for a given set of taxa in the database, run a command like this:

bin/clear_taxon_data.sh --taxon-ids taxa_name