GitHub

The Peptonizer 2000

Integrating PepGM and Unipept for probability-based taxonomic inference of metaproteomic samples

Table of Contents

About The Project
Input
Getting Started
Usage
Roadmap
Contributing
License
Contact

About The Project

Introducing the Peptonizer2000 - a tool that combines the capabilities of Unipept and PepGM to analyze metaproteomic mass spectrometry-based samples. Originally designed for taxonomic inference of viral mass spectrometry-based samples, we've extended PepGM's functionality to analyze metaproteomic samples by retrieving taxonomic information from the Unipept database.

PepGM is a probabilistic graphical model developed by the eScience group at BAM (Federal Institute for Materials Research and Testing) that uses belief propagation to infer the taxonomic origin of peptides and taxa in viral samples. You can learn more about PepGM on our eScience group at BAM (Federal Institute for Materials Research and Testing). Please refer to our GitHub page.

Unipept, on the other hand, is a web-based metaproteomics analysis tool that provides taxonomic information for identified peptides. To make it work seamlessly with PepGM, we've extended Unipept with new functionalities that restrict the taxa queried and provide all potential taxonomic origins of the peptides queried. Check out more information about Unipept here.

With the Peptonizer2000, you can look forward to a comprehensive and streamlined workflow that simplifies the process of identifying peptides and their taxonomic origins in metaproteomic samples.

The Peptonizer2000 workflow is comprised of the following steps:

Query all identified peptides, provided by the user in a .tsv file, in the Unipept API, and restrict the taxonomic range queried based on any prior knowledge of the sample.
Assemble the peptide-taxon associations provided by Unipept into a bipartite graph, where peptides and taxa are represented by different nodes, and an edge is drawn between a peptide and a taxon if the peptide is part of the taxon's proteome.
Transform the bipartite graph into a factor graph using convolution trees and conditional probability table factors (CPD).
Run the belief propagation algorithm multiple times with different sets of CPD parameters until convergence, to obtain posterior probabilities of candidate taxa.
Use an empirically deduced metric to determine the ideal graph parameter set.
Output the top scoring taxa as a results barchart. The results are also available as comma-separated files for further downstream analysis or visualizations.

(back to top)

Input

A .tsv file of your peptides output from any protoemic peptide search method. The first column should be the peptide, the second column it's score attributed by the search engine. An example is provided in test files.
A config file with your parameters for the peptonizer2000. A more detailed description of the configuration file can be found below. Additionally, an exemplary config file is provided in this repository.

(back to top)

Getting Started

Prerequisites

Make sure you have git installed and clone the repo:

git clone https://github.com/BAMeScience/Peptonizer2000.git

The Peptonizer relies on a snakemake workflow developed with snakemake 5.10.0.
Installing snakemake requires mamba.

To install mamba:

conda install -n <your_env> -c conda-forge mamba

Alternatively, if you do not have conda installed, you can download mamba directly together with miniforge(intructions from the mamba installation guide):

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

To install snakemake:

conda activate <your_env>
mamba install -c conda-forge -c bioconda -n <your_snakemake_env> snakemake

In accordance with the Snakemake recommendations, we suggest to save your sample data in resources folder. All outputs will be saved in results.

Additional dependencies necessary are Java and GCC.

The Peptonizer2000 is tested for Linux OS.

All necessary binaries are autmatically installed using conda.

Configuration file

The Peptonizer2000 relies on a configuration file in yaml format to set up the workflow. An example configuration file is provided in config/config.yaml.
Do not change the config file location.

Peptonizer parameter

DataDir: Relative path to raw spectra
ResultsDir: Relative path to results
ResourcesDir: Relative path to resources
ExperimentName: Name of subfolder in results
TaxaInPlot: # of inferred taxa that appear in the barplot that is created of the results csv
Alpha: Grid search increments for alpha
Beta: Grid search increments for beta
prior: grid search increments for prior

Sample specific parameter

PeptidesAndScores: path to you .tsv file of input peptides
SampleName: wildcard for spectra file and folder name

UniPept parameter

TaxaNumber: # of taxa
targetTaxa: Comma separated list of taxa compromised in the UniPept query. If querying all of Unipept, use '1'

Output files

All Peptonizer2000 output files are saved into the results folder and include the following:

Main results:

Peptonizer_Results.csv: Table with values ID, score, type (contains all taxids under 'ID' and all probabilities under ' score' tosterior probabilities of n (default: 15) highest scoring taxa

Additional (intermediate):

Intermediate results folder sorted by their prior value for all possible grid search parameter combinations
TaxaWeights.csv: csv file of all taxids that had at least one protein map to them and their weight
PepGM_graph.graphml: graphml file of the graphical model (without convolution tree factors). Useful to visualize the graph structure and peptide-taxon connections
paramcheck.png: barplot of the metric used to determine the graphical model parameters for n (default: 15) best performing parameter combinations
additional .csv files resulting from the clustering of taxa by peptidome
log files for bug fixing

(back to top)

Testing the Peptonizer

To test the Peptonizer2000 and see if it is set up correctly on your machine, we provide a test file under resources/test_files. This should be dowloaded automatically if you follow the installation instructions above. The test file is a .tsv resulting from the sample S03 of the CAMPI study searched against a sample specific database using X!Tandem and MS2Rescore. The original file are available through PRIDE under PXD023217.

To execute a test run of the Peptonizer2000 using the provided files:

Follow the installation instructions above
In you terminal, go to the folder resources/test_files
execute the following code to move config file to the right directory

cp ./config.yaml ../../config/

You need to make some alterations to the provided example config file.
- input the path to the S03 .tsv file . It should be something like 'path_to_workflow_directory/resources/SampleData/S03_test.tsv'

You should now me all set up to run the Peptonizer2000 on the test files. In your terminal, run

snakemake --use-conda --cores <n>

is the number of cores available on your machine to run this workflow. Make sure your mamba environment, to which you downloaded snakemake, is active.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Tanja Holstein - @HolsteinTanja - tanja.holstein@ugent.be
Pieter Verschaffelt - pieter.verschaffelt@ugent.be

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
config		config
images		images
resources		resources
workflow		workflow
.gitignore		.gitignore
LICENSE.md		LICENSE.md
S03_sample.tsv		S03_sample.tsv
dag.svg		dag.svg
graphenv.yml		graphenv.yml
readme.md		readme.md
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Peptonizer 2000

About The Project

Input

Getting Started

Prerequisites

Configuration file

Output files

Testing the Peptonizer

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

BAMeScience/Peptonizer2000

Folders and files

Latest commit

History

Repository files navigation

The Peptonizer 2000

About The Project

Input

Getting Started

Prerequisites

Configuration file

Output files

Testing the Peptonizer

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages