-
Notifications
You must be signed in to change notification settings - Fork 0
Overview
Welcome to the PgxSAVy wiki!
PgxSAVy is a tool for quality control and annotation of variant peptides after FDR in proteogenomics.
The PgxSAVy tool is broadly divided into-
1. Data layer:
This represents the data files supported for various algorithms. Table 1 (in section 5) shows the search algorithms supported with corresponding data file format. Care has been taken to read the default files from search results. This layer reads the file formats and converts them to easily handled text formats (TSV/CSV) for faster, easier access and quick filtering for results parsing later. MS/MS data is required in 'MGF' format for spectra level information.
2. Input Layer
Input layer has file parsers to convert from the native result formats to tabular formats. The file reading function then reads the spectrum, peptide, variant, modifications and protein details. Based on the spectrum title, it extracts information from MS/MS data.
3. Rescoring Layer
Based on the VAS scoring method, for each PSM, a normalized MW score is calculated for variant peptide, corresponding wildtype peptide and shuffled variant decoy peptides. These calculated scores are saved in a two-dimensional array and zscore and p-value is estimated. For estimation of zscore, negative scoring PSMs are used to create a null hypothesis for normal distribution. If PSM fails to follow the normal distribution, it belongs to a uniform distribution which reflects the confident PSMs.
4. Annotation Layer
The p-value filters (≤ x, where x is desired cut-off) can be directly applied to classify variant peptides as confident, semi-confident and doubtful. Further, isobaric and disease annotation is done over the identified variants.
The following flowchart shows the broad details of the PgxSAVy tool functionality and data flow during rescoring and annotation.
Figure 1: Overview of PgxSAVy tool
In PgxSAVy, the following steps are followed during quality filtering and annotation -
-
Search result file is read and the related information is stored in a hash.
-
Corresponding spectrum information for each scan is read from the mgf file and stored in a hash.
-
All scoring components are calculated using a scoring module which include normalized MassWiz Score, Wild Type peptide score and shuffled variant decoy score.
-
Based on all these scores, a variant ambiguity score (VAS) is calculated.
-
A z-score is estimated based on distribution of scores and corresponding p-value is estimated.
The PgxSAVy tool supports the following formulations on rescoring.
VAS is defined as --
VAS = (nMW+ ∆SV+ΔWS)/3) * log10(P+1) * log3(SE+1)
Where nMW is normalized MassWiz Score, ΔSV is delta shuffled variant score, ΔWS is delta wildtype score, P is PSMs count per peptide and SE is search engine counts.