Quick Usage Guide

Jump to bottom Edit New page

Jaci Saunders edited this page May 2, 2020 · 6 revisions

1: Digest and ingest data

Run the script bin/digest_and_ingest.sh with FASTA proteome files you wish to digest and ingest. e.g.:

bin/digest_and_ingest.sh file1.fasta file2.fasta ...

This script reads the FASTA files, and runs digestions on their sequences. You should see a fair amount of output as these files are processed.

2: Generate redundancy tables

See available taxon ids by querying DB: e.g.

bin/list_taxon_ids.sh

Generate redundancy tables for groups of taxons e.g.

bin/generate_redundancy_tables.sh --taxon-ids syn8102 syn7502 syn7503 --output-dir exampleRedundancyTables

Note that you can also specify a file that contains a list of taxon IDs, e.g

bin/generate_redundancy_tables.sh --taxon-id-file taxon_id_list.txt --output-dir exampleRedundancyTables

View resulting files in exampleRedundancyTables
- redundancy.db.sqlite is generated with the redundancy information
- counts.csv contains counts of redundant peptides
- percents.csv contains the values in counts.csv, divided by the number of unique peptides in the union of digestions of a taxa pair.

3: Remove Taxa from the Database

If you wish to delete data for a given set of taxa in the database, run a command like this:

bin/clear_taxon_data.sh --taxon-ids taxa_name