PGA is an R package for identification of novel peptides by customized database derived from RNA-Seq or DNA-Seq data. This package provides functions for construction of customized protein databases based on RNA-Seq data with/without genome guided or DNA-Seq data, database searching, post-processing and report generation. This kind of customized protein database includes both the reference database (such as Refseq or ENSEMBL) and the novel peptide sequences form RNA-Seq data or DNA-Seq data.
Please read this document to find how to use PGA: PGA tutorial. If you have any questions about PGA, please open an issue here: open an issue.
Demo report of PGA output: Demo report.
Global FDR estimation at PSM or peptide level: example
Separate FDR estimation at PSM or peptide level: example
More about PGA.
To install PGA:
# Install the development version from GitHub:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
install.packages("remotes")
BiocManager::install("wenbostar/PGA")
Use PGA in docker (recommended):
Find detail at https://github.com/wenbostar/PGA/wiki/Use-PGA-docker.
To cite the PGA
package in publications, please use:
Wen B, Xu S, Zhou R, et al. PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq. BMC bioinformatics, 2016, 17(1): 244. DOI: 10.1186/s12859-016-1133-3
Wen, B., Xu, S., Sheynkman, G.M., Feng, Q., Lin, L., Wang, Q., Xu, X., Wang, J. and Liu, S., 2014. sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments. Bioinformatics, 30(21), pp.3136-3138. DOI: 10.1093/bioinformatics/btu397
PGA
/sapFinder
has been cited in the following manuscripts:
- Ignatchenko, Alexandr, et al. "Detecting protein variants by mass spectrometry: a comprehensive study in cancer cell-lines." Genome medicine 9.1 (2017): 62.
- Luan, Ning, et al. "A combinational strategy upon RNA sequencing and peptidomics unravels a set of novel toxin peptides in scorpion Mesobuthus martensii." Toxins 8.10 (2016): 286.
- Ma, Chunwei, et al. "Improvement of peptide identification with considering the abundance of mRNA and peptide." BMC bioinformatics 18.1 (2017): 109.
- Zhang, Jia, et al. "GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes." Molecular & Cellular Proteomics 15.11 (2016): 3529-3539.
- Proffitt, J. Michael, et al. "Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys." BMC genomics 18.1 (2017): 877.
- Dimitrakopoulos, Lampros, et al. "Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction." Critical reviews in clinical laboratory sciences 54.6 (2017): 414-432.
- Ruggles, Kelly V., et al. "Methods, tools and current perspectives in proteogenomics." Molecular & Cellular Proteomics 16.6 (2017): 959-981.
- Peng, Xinxin, et al. "A-to-I RNA editing contributes to proteomic diversity in cancer." Cancer cell 33.5 (2018): 817-828.
- Zhang, Minying, et al. "RNA editing derived epitopes function as cancer antigens to elicit immune responses." Nature communications 9.1 (2018): 3919.
- Low, Teck Yew, et al. "Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology." Proteomics (2018): 1800235.
- Wen, Bo, Xiaojing Wang, and Bing Zhang. "PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations." Genome research 29.3 (2019): 485-493.
- Lobas, Anna A., et al. "Proteogenomics of malignant melanoma cell lines: the effect of stringency of exome data filtering on variant peptide identification in shotgun proteomics." Journal of proteome research 17.5 (2018): 1801-1811.
- Robin, Thibault, et al. "Large-scale reanalysis of publicly available HeLa cell proteomics data in the context of the Human Proteome Project." Journal of proteome research 17.12 (2018): 4160-4170.
- Yang, Mingkun, et al. "Genome annotation of a model diatom Phaeodactylum tricornutum using an integrated proteogenomic pipeline." Molecular plant 11.10 (2018): 1292-1307.
- Cifani, Paolo, et al. "ProteomeGenerator: A framework for comprehensive proteomics based on de novo transcriptome assembly and high-accuracy peptide mass spectral matching." Journal of proteome research 17.11 (2018): 3681-3692.
- Misra, Biswapriya B. "Updates on resources, software tools, and databases for plant proteomics in 2016–2017." Electrophoresis 39.13 (2018): 1543-1557.
- Rong, Mingqiang, et al. "The defensive system of tree frog skin identified by peptidomics and RNA sequencing analysis." Amino Acids 51.2 (2019): 345-353.
- Rong, Mingqiang, et al. "PPIP: Automated Software for Identification of Bioactive Endogenous Peptides." Journal of proteome research (2019).
- Nagaraj, Shivashankar H., et al. "PGTools: a software suite for proteogenomic data analysis and visualization." Journal of proteome research 14.5 (2015): 2255-2266.
- Sheynkman, Gloria M., et al. "Proteogenomics: integrating next-generation sequencing and mass spectrometry to characterize human proteomic variation." Annual review of analytical chemistry 9 (2016): 521-545.
- Menschaert, Gerben, and David Fenyö. "Proteogenomics from a bioinformatics angle: A growing field." Mass spectrometry reviews 36.5 (2017): 584-599.
- Komor, Malgorzata A., et al. "Identification of differentially expressed splice variants by the proteogenomic pipeline Splicify." Molecular & Cellular Proteomics 16.10 (2017): 1850-1863.
- Hernandez-Valladares, Maria, et al. "Proteogenomics approaches for studying cancer biology and their potential in the identification of acute myeloid leukemia biomarkers." Expert review of proteomics 14.8 (2017): 649-663.
- Ischenko, Dmitry, et al. "Large scale analysis of amino acid substitutions in bacterial proteomics." BMC bioinformatics 17.1 (2016): 450.
Contributions to the package are more than welcome.