-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Philosopher 4.1.1 generates empty files due to the incompatibility of the fasta file #537
Comments
Everything looks good except there is no entries in the tsv files. Felipe @prvst , can you take a look? They said that FragPipe 16, which implied the Philosopher 4.0.0, worked well. Thanks, Fengchao |
@fazeliniah are you running Philosopher v4.1.1? |
I am using Philosopher version 4.1.0 |
I just tried the v.4.1.1 and got the same issue. |
@fazeliniah your issue is somewhat related to a different situation reported a few weeks ago by a different person. Because someone was searching a database containing the same protein with slightly different headers, I had to include the protein description to the method that fetches information from the database annotation. The reason you see a problem with your search is because PeptideProphet also parses the protein description, and replaces some characters by empty spaces. I included the same rule into Philosopher, and the update will be available in the upcoming release that'm planning for the next week.
Thanks for reporting this. Added to v4.1.2 |
Hi Felipe @prvst , The BTW, what kind of characters does PeptideProphet replaced? Best, Fengchao |
Sorry, I forgot about Percolator. If the parsing rules are different, then yes, it will brake the logic. PeptideProphet replaces the pipe character ( | ) by an empty space. This is the only one I'm aware of at this moment, I don't know if the same thing happens with other special characters |
Thanks for the info. I think we can do the same for Percolator. Let me see if I can find the code in PeptideProphet to get all of the characters to be replaced. Best, Fengchao |
Hi Fengchao and team, |
I guess you need to check with Felipe @prvst about the fixed Philosopher. Best, Fengchao |
Please refer to my previous reply. Peptideprophet is replacing some special characters, like the pipe, by an empty space, you might want to avoid them, or use a standard format. @fcyu You mentioned above that you would look the PeptideProhet source code to look for the special characters that are replaced, did you make any progress on that? |
Hi Felipe @prvst , Yes, please check the code here https://sourceforge.net/p/sashimi/code/HEAD/tree/trunk/trans_proteomic_pipeline/src/Common/util.cpp#l533. The However, I could not find any code replacing BTW, I think it might not be a good idea using the protein description as part of the ID. There are tools modifying or truncating the protein description in different ways in writing the result. You will not be able to map proteins back to the fasta file. Best, Fengchao |
Yes, the description is modified |
OK, actually, PeptideProphet does not replace But ProteinProphet does: I will read the ProteinProphet code then. Best, Fengchao
|
Thanks @guoci , it does have more rules than replacing Best, Fengchao |
Hi @fazeliniah , Can you re-analyze your data using this MSFragger (https://www.dropbox.com/s/xggvogvbqq7nmhf/MSFragger-3.5-rc8.zip?dl=0)? It will clean the protein description according to the rules used by ProteinProphet, which will prevent from triggering Philosopher's bug. Best, Fengchao |
Sorry that I forgot one more thing. With this change in MSFragger, we don't need to change Percolator and other tools because the the protein descriptions have already been cleaned up at the very beginning (ProteinProphet won't change the protein descriptions anymore). But, Philosopher still needs to have the same cleaning up rules in load the fasta file, otherwise, it will not be able to map the proteins in pep.xml back to the fasta file. Felipe @prvst , can you make the changes according to the Thanks, Fengchao
|
Hi Fengchao, |
I observed the same issue with standard search using GenCode database. Need to fix for the next release |
fixed |
Thanks. |
Dear developer team,
I am trying to use the fasta database generated in this paper: https://www.nature.com/articles/s41587-021-01021-3. In brief the new fasta file has additional~320K proteins resulted from different RNA sequencing data.
The v.16 of MSFragger handled these data very nicely. Unfortunately I don't know why I can't replicate the analysis in v.17.1. The job finished without any errors but the list of the peptide/protein are empty. For validation I am using the peptideprophet (for unspecific search).
I have put the output of the analysis in here: https://www.dropbox.com/sh/ciq36i6shg79d6z/AAA1l-4QX1t5ZjJjXtbPmp91a?dl=0
Thank you again as always for your great program and support.
The text was updated successfully, but these errors were encountered: