Skip to content
Jaci Saunders edited this page May 2, 2020 · 2 revisions

Using the Database

The main database is a SQLite standalone database file proteomics.db.sqlite. The redundancy database is a SQLite standalone database file redundancy.db.sqlite.

If sqlite installed, then these can be queried directly via sqlite: sqlite3 redundancy.db.sqlite or sqlite3 proteomics.db.sqlite

Use of the sqlite3 client is out of the scope of this document. Run the command 'man sqlite3' for more information about the sqlite3 client. (type 'q' to leave the man page)

For general SQL tutorials, try this site: http://sqlzoo.net/

If you plan to use the database, it is recommended that you understand the 'Database Schema' section. Below are Example Queries:


Identify all taxa in the redundancy table database

SELECT 
 taxon.id 
FROM 
 taxon

Count unique peptides in a taxon digest

SELECT
 count(*)
FROM
 taxon_digest_peptide
 JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
 JOIN taxon ON taxon_digest.taxon_id = taxon.id
 JOIN digest on taxon_digest.digest_id = digest.id
WHERE
 taxon.id = 'syn7805'
 AND digest.id = 1

List Peptides for a given taxon digest

SELECT 
 taxon.id, peptide.sequence
FROM
 taxon_digest_peptide
 JOIN peptide ON taxon_digest_peptide.peptide_id = peptide.id
 JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
 JOIN taxon ON taxon_digest.taxon_id = taxon.id
 JOIN digest on taxon_digest.digest_id = digest.id
WHERE
 taxon.id = 'syn7805'
 AND digest.id = 1

List taxa whose digests produce a specific peptide sequence

SELECT 
 peptide.sequence, taxon.id
FROM
 taxon_digest_peptide
 JOIN peptide ON taxon_digest_peptide.peptide_id = peptide.id
 JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
 JOIN taxon ON taxon_digest.taxon_id = taxon.id
 JOIN digest on taxon_digest.digest_id = digest.id
WHERE
 peptide.sequence = 'ATLQP'
 AND digest.id = 1
ORDER BY
 taxon.id

Count number of proteins that occur multiple times in a taxon proteome

SELECT 
 COUNT(*) AS multiply_occuring_proteins
FROM (
 SELECT 
  COUNT(protein.id) AS num_instances
 FROM
  taxon_protein
  JOIN taxon ON taxon_protein.taxon_id = taxon.id
  JOIN protein ON taxon_protein.protein_id = protein.id
 WHERE
  taxon.id = 'syn7805'
 GROUP BY
  protein.id
 HAVING
  num_instances > 1
) AS subquery