Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537

fazeliniah · 2021-11-23T17:37:37Z

Dear developer team,
I am trying to use the fasta database generated in this paper: https://www.nature.com/articles/s41587-021-01021-3. In brief the new fasta file has additional~320K proteins resulted from different RNA sequencing data.
The v.16 of MSFragger handled these data very nicely. Unfortunately I don't know why I can't replicate the analysis in v.17.1. The job finished without any errors but the list of the peptide/protein are empty. For validation I am using the peptideprophet (for unspecific search).
I have put the output of the analysis in here: https://www.dropbox.com/sh/ciq36i6shg79d6z/AAA1l-4QX1t5ZjJjXtbPmp91a?dl=0
Thank you again as always for your great program and support.

fcyu · 2021-11-23T17:52:58Z

Everything looks good except there is no entries in the tsv files.

Felipe @prvst , can you take a look? They said that FragPipe 16, which implied the Philosopher 4.0.0, worked well.

Thanks,

Fengchao

prvst · 2021-11-25T21:51:29Z

@fazeliniah are you running Philosopher v4.1.1?

fazeliniah · 2021-11-29T16:00:20Z

I am using Philosopher version 4.1.0

fazeliniah · 2021-11-29T19:14:07Z

I just tried the v.4.1.1 and got the same issue.

prvst · 2021-12-03T21:03:08Z

@fazeliniah your issue is somewhat related to a different situation reported a few weeks ago by a different person. Because someone was searching a database containing the same protein with slightly different headers, I had to include the protein description to the method that fetches information from the database annotation. The reason you see a problem with your search is because PeptideProphet also parses the protein description, and replaces some characters by empty spaces. I included the same rule into Philosopher, and the update will be available in the upcoming release that'm planning for the next week.

INFO[16:00:38] 1+ Charge profile                             decoy=29 target=265
INFO[16:00:38] 2+ Charge profile                             decoy=146 target=3776
INFO[16:00:38] 3+ Charge profile                             decoy=114 target=3031
INFO[16:00:38] 4+ Charge profile                             decoy=25 target=628
INFO[16:00:38] 5+ Charge profile                             decoy=0 target=0
INFO[16:00:38] 6+ Charge profile                             decoy=0 target=0
INFO[16:00:38] Database search results                       ions=6159 peptides=5252 psms=8014
INFO[16:00:38] Converged to 1.00 % FDR with 6546 PSMs        decoy=66 threshold=0.7128 total=6612
INFO[16:00:38] Converged to 1.00 % FDR with 4081 Peptides    decoy=41 threshold=0.8093 total=4122
INFO[16:00:38] Converged to 0.99 % FDR with 4927 Ions        decoy=49 threshold=0.7409 total=4976
INFO[16:00:39] Protein inference results                     decoy=297 target=3657
INFO[16:00:39] Converged to 1.04 % FDR with 1819 Proteins    decoy=19 threshold=0.9706 total=1838
INFO[16:00:39] Applying sequential FDR estimation            ions=4733 peptides=3975 psms=6298
INFO[16:00:39] Converged to 0.39 % FDR with 6273 PSMs        decoy=25 threshold=0.7132 total=6298
INFO[16:00:39] Converged to 0.48 % FDR with 3956 Peptides    decoy=19 threshold=0.7132 total=3975
INFO[16:00:39] Converged to 0.40 % FDR with 4714 Ions        decoy=19 threshold=0.7132 total=4733
INFO[16:00:40] Post processing identifications              
INFO[16:00:43] Assigning protein identifications to layers  
INFO[16:00:46] Processing protein inference                 
INFO[16:02:26] Synchronizing PSMs and proteins              
INFO[16:02:26] Total report numbers after FDR filtering, and post-processing  ions=4613 peptides=3869 proteins=1742 psms=6150

Thanks for reporting this.

Added to v4.1.2

fcyu · 2021-12-03T21:11:53Z

Hi Felipe @prvst ,

The interact-*.pep.xml from Percolator won't have such replacement. Will your changes break the Percolator related workflows?

BTW, what kind of characters does PeptideProphet replaced?

Best,

Fengchao

prvst · 2021-12-03T21:33:25Z

Sorry, I forgot about Percolator. If the parsing rules are different, then yes, it will brake the logic. PeptideProphet replaces the pipe character ( | ) by an empty space. This is the only one I'm aware of at this moment, I don't know if the same thing happens with other special characters

fcyu · 2021-12-03T21:39:50Z

Thanks for the info. I think we can do the same for Percolator. Let me see if I can find the code in PeptideProphet to get all of the characters to be replaced.

Best,

Fengchao

fazeliniah · 2022-01-03T17:38:38Z

Hi Fengchao and team,
Hope you all had a wonderful holiday.
I am just wondering if there is an update for this issue.
Thanks

fcyu · 2022-01-03T21:01:13Z

I guess you need to check with Felipe @prvst about the fixed Philosopher.

Best,

Fengchao

prvst · 2022-01-03T21:10:19Z

@fazeliniah your issue is somewhat related to a different situation reported a few weeks ago by a different person. Because someone was searching a database containing the same protein with slightly different headers, I had to include the protein description to the method that fetches information from the database annotation. The reason you see a problem with your search is because PeptideProphet also parses the protein description, and replaces some characters by empty spaces. I included the same rule into Philosopher, and the update will be available in the upcoming release that'm planning for the next week.

INFO[16:00:38] 1+ Charge profile                             decoy=29 target=265
INFO[16:00:38] 2+ Charge profile                             decoy=146 target=3776
INFO[16:00:38] 3+ Charge profile                             decoy=114 target=3031
INFO[16:00:38] 4+ Charge profile                             decoy=25 target=628
INFO[16:00:38] 5+ Charge profile                             decoy=0 target=0
INFO[16:00:38] 6+ Charge profile                             decoy=0 target=0
INFO[16:00:38] Database search results                       ions=6159 peptides=5252 psms=8014
INFO[16:00:38] Converged to 1.00 % FDR with 6546 PSMs        decoy=66 threshold=0.7128 total=6612
INFO[16:00:38] Converged to 1.00 % FDR with 4081 Peptides    decoy=41 threshold=0.8093 total=4122
INFO[16:00:38] Converged to 0.99 % FDR with 4927 Ions        decoy=49 threshold=0.7409 total=4976
INFO[16:00:39] Protein inference results                     decoy=297 target=3657
INFO[16:00:39] Converged to 1.04 % FDR with 1819 Proteins    decoy=19 threshold=0.9706 total=1838
INFO[16:00:39] Applying sequential FDR estimation            ions=4733 peptides=3975 psms=6298
INFO[16:00:39] Converged to 0.39 % FDR with 6273 PSMs        decoy=25 threshold=0.7132 total=6298
INFO[16:00:39] Converged to 0.48 % FDR with 3956 Peptides    decoy=19 threshold=0.7132 total=3975
INFO[16:00:39] Converged to 0.40 % FDR with 4714 Ions        decoy=19 threshold=0.7132 total=4733
INFO[16:00:40] Post processing identifications              
INFO[16:00:43] Assigning protein identifications to layers  
INFO[16:00:46] Processing protein inference                 
INFO[16:02:26] Synchronizing PSMs and proteins              
INFO[16:02:26] Total report numbers after FDR filtering, and post-processing  ions=4613 peptides=3869 proteins=1742 psms=6150

Thanks for reporting this.

Added to v4.1.2

Please refer to my previous reply. Peptideprophet is replacing some special characters, like the pipe, by an empty space, you might want to avoid them, or use a standard format.

@fcyu You mentioned above that you would look the PeptideProhet source code to look for the special characters that are replaced, did you make any progress on that?

fcyu · 2022-01-03T21:19:59Z

Hi Felipe @prvst ,

Yes, please check the code here https://sourceforge.net/p/sashimi/code/HEAD/tree/trunk/trans_proteomic_pipeline/src/Common/util.cpp#l533. The XMLEscape(const string& s) function is used by the RefreshParser.cpp: https://sourceforge.net/p/sashimi/code/HEAD/tree/trunk/trans_proteomic_pipeline/src/Parsers/RefreshParser/RefreshParser.cpp#l1660

However, I could not find any code replacing | with space, can you confirm that it is replaced?

BTW, I think it might not be a good idea using the protein description as part of the ID. There are tools modifying or truncating the protein description in different ways in writing the result. You will not be able to map proteins back to the fasta file.

Best,

Fengchao

prvst · 2022-01-03T21:23:40Z

However, I could not find any code replacing | with space, can you confirm that it is replaced?

Yes, the description is modified

fcyu · 2022-01-03T21:52:25Z

OK, actually, PeptideProphet does not replace |:

But ProteinProphet does:

I will read the ProteinProphet code then.

Best,

Fengchao

@fazeliniah your issue is somewhat related to a different situation reported a few weeks ago by a different person. Because someone was searching a database containing the same protein with slightly different headers, I had to include the protein description to the method that fetches information from the database annotation. The reason you see a problem with your search is because PeptideProphet also parses the protein description, and replaces some characters by empty spaces. I included the same rule into Philosopher, and the update will be available in the upcoming release that'm planning for the next week.
INFO[16:00:38] 1+ Charge profile                             decoy=29 target=265
INFO[16:00:38] 2+ Charge profile                             decoy=146 target=3776
INFO[16:00:38] 3+ Charge profile                             decoy=114 target=3031
INFO[16:00:38] 4+ Charge profile                             decoy=25 target=628
INFO[16:00:38] 5+ Charge profile                             decoy=0 target=0
INFO[16:00:38] 6+ Charge profile                             decoy=0 target=0
INFO[16:00:38] Database search results                       ions=6159 peptides=5252 psms=8014
INFO[16:00:38] Converged to 1.00 % FDR with 6546 PSMs        decoy=66 threshold=0.7128 total=6612
INFO[16:00:38] Converged to 1.00 % FDR with 4081 Peptides    decoy=41 threshold=0.8093 total=4122
INFO[16:00:38] Converged to 0.99 % FDR with 4927 Ions        decoy=49 threshold=0.7409 total=4976
INFO[16:00:39] Protein inference results                     decoy=297 target=3657
INFO[16:00:39] Converged to 1.04 % FDR with 1819 Proteins    decoy=19 threshold=0.9706 total=1838
INFO[16:00:39] Applying sequential FDR estimation            ions=4733 peptides=3975 psms=6298
INFO[16:00:39] Converged to 0.39 % FDR with 6273 PSMs        decoy=25 threshold=0.7132 total=6298
INFO[16:00:39] Converged to 0.48 % FDR with 3956 Peptides    decoy=19 threshold=0.7132 total=3975
INFO[16:00:39] Converged to 0.40 % FDR with 4714 Ions        decoy=19 threshold=0.7132 total=4733
INFO[16:00:40] Post processing identifications              
INFO[16:00:43] Assigning protein identifications to layers  
INFO[16:00:46] Processing protein inference                 
INFO[16:02:26] Synchronizing PSMs and proteins              
INFO[16:02:26] Total report numbers after FDR filtering, and post-processing  ions=4613 peptides=3869 proteins=1742 psms=6150
Thanks for reporting this.
Added to v4.1.2
Please refer to my previous reply. Peptideprophet is replacing some special characters, like the pipe, by an empty space, you might want to avoid them, or use a standard format.

@fcyu You mentioned above that you would look the PeptideProhet source code to look for the special characters that are replaced, did you make any progress on that?

guoci · 2022-01-03T21:54:36Z

It is here https://sourceforge.net/p/sashimi/code/HEAD/tree/trunk/trans_proteomic_pipeline/src/Validation/ProteinProphet/ProteinProphet.cpp#l7256

fcyu · 2022-01-03T22:00:09Z

Thanks @guoci , it does have more rules than replacing | with . I can add them to MSFragger so that downstream tools will no need to make any changes.

Best,

Fengchao

fcyu · 2022-01-03T22:51:58Z

Hi @fazeliniah ,

Can you re-analyze your data using this MSFragger (https://www.dropbox.com/s/xggvogvbqq7nmhf/MSFragger-3.5-rc8.zip?dl=0)? It will clean the protein description according to the rules used by ProteinProphet, which will prevent from triggering Philosopher's bug.

Best,

Fengchao

fcyu · 2022-01-03T23:29:43Z

Sorry that I forgot one more thing.

With this change in MSFragger, we don't need to change Percolator and other tools because the the protein descriptions have already been cleaned up at the very beginning (ProteinProphet won't change the protein descriptions anymore).

But, Philosopher still needs to have the same cleaning up rules in load the fasta file, otherwise, it will not be able to map the proteins in pep.xml back to the fasta file.

Felipe @prvst , can you make the changes according to the cleanUpProteinDescription function pointed out by Guo Ci, and send the fixed Philosopher?

Thanks,

Fengchao

Hi @fazeliniah ,

Can you re-analyze your data using this MSFragger (https://www.dropbox.com/s/xggvogvbqq7nmhf/MSFragger-3.5-rc8.zip?dl=0)? It will clean the protein description according to the rules used by ProteinProphet, which will prevent from triggering Philosopher's bug.

Best,

Fengchao

fazeliniah · 2022-01-21T14:47:08Z

Hi Fengchao,
I tested the MSFragger 3.4 and 3.5 and they both work nicely with our HLA peptidome project. The issue was related to our RNA-derived fasta database. The presence of new characters in the header (e.g. +, -, *, ~) and some duplicate sequences were the main issue. Thank you again for all your help.
Thanks

anesvi · 2022-01-24T19:50:27Z

I observed the same issue with standard search using GenCode database. Need to fix for the next release

fcyu · 2022-12-12T15:30:36Z

I observed the same issue with standard search using GenCode database. Need to fix for the next release

@prvst @anesvi Is it fixed?

Best,

Fengchao

prvst · 2022-12-12T18:32:04Z

fixed

fcyu · 2022-12-12T18:33:29Z

Thanks.

fcyu transferred this issue from Nesvilab/MSFragger Nov 23, 2021

fcyu assigned prvst Nov 23, 2021

fcyu added the Philosopher label Nov 23, 2021

prvst closed this as completed Dec 3, 2021

fcyu changed the title ~~MF17.1 and HLA analysis~~ Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file Dec 3, 2021

fcyu pinned this issue Dec 3, 2021

fcyu reopened this Dec 3, 2021

fcyu self-assigned this Dec 3, 2021

fcyu mentioned this issue Jul 4, 2023

Let Philosopher print errors when there are protein hits from PSMs that cannot be found in the fasta file. Nesvilab/philosopher#447

Open

fcyu unpinned this issue Jan 3, 2022

fcyu closed this as completed Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537

Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537

fazeliniah commented Nov 23, 2021

fcyu commented Nov 23, 2021

prvst commented Nov 25, 2021

fazeliniah commented Nov 29, 2021

fazeliniah commented Nov 29, 2021

prvst commented Dec 3, 2021

fcyu commented Dec 3, 2021 •

edited

Loading

prvst commented Dec 3, 2021

fcyu commented Dec 3, 2021

fazeliniah commented Jan 3, 2022

fcyu commented Jan 3, 2022

prvst commented Jan 3, 2022 •

edited

Loading

fcyu commented Jan 3, 2022 •

edited

Loading

prvst commented Jan 3, 2022

fcyu commented Jan 3, 2022

guoci commented Jan 3, 2022

fcyu commented Jan 3, 2022

fcyu commented Jan 3, 2022

fcyu commented Jan 3, 2022

fazeliniah commented Jan 21, 2022

anesvi commented Jan 24, 2022

fcyu commented Dec 12, 2022

prvst commented Dec 12, 2022

fcyu commented Dec 12, 2022

Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537

Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537

Comments

fazeliniah commented Nov 23, 2021

fcyu commented Nov 23, 2021

prvst commented Nov 25, 2021

fazeliniah commented Nov 29, 2021

fazeliniah commented Nov 29, 2021

prvst commented Dec 3, 2021

fcyu commented Dec 3, 2021 • edited Loading

prvst commented Dec 3, 2021

fcyu commented Dec 3, 2021

fazeliniah commented Jan 3, 2022

fcyu commented Jan 3, 2022

prvst commented Jan 3, 2022 • edited Loading

fcyu commented Jan 3, 2022 • edited Loading

prvst commented Jan 3, 2022

fcyu commented Jan 3, 2022

guoci commented Jan 3, 2022

fcyu commented Jan 3, 2022

fcyu commented Jan 3, 2022

fcyu commented Jan 3, 2022

fazeliniah commented Jan 21, 2022

anesvi commented Jan 24, 2022

fcyu commented Dec 12, 2022

prvst commented Dec 12, 2022

fcyu commented Dec 12, 2022

fcyu commented Dec 3, 2021 •

edited

Loading

prvst commented Jan 3, 2022 •

edited

Loading

fcyu commented Jan 3, 2022 •

edited

Loading