Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequenceWindow, Ile over-representation on the n-term side #500

Open
pisistrato opened this issue Jul 18, 2024 · 5 comments
Open

SequenceWindow, Ile over-representation on the n-term side #500

pisistrato opened this issue Jul 18, 2024 · 5 comments

Comments

@pisistrato
Copy link

Hi,

I was inspecting the files generated in the tmt-report folder. I noticed a suspicious over-representation of Ile in the SequenceWindow on the n-term side, i.e. before the detected peptide sequence.
Can you comment on how that is calculated?
It might be real, but I was expecting a Leu to be over-represented...

@fcyu fcyu self-assigned this Jul 18, 2024
@fcyu
Copy link
Member

fcyu commented Jul 18, 2024

It just uses the sequence of the assigned protein in the fasta file.

Best,

Fengchao

@pisistrato
Copy link
Author

Since it was very strange, I checked the sequences manually, this is what I see

<style> </style>
ProteinID SequenceWindow Start SequenceWindowFromFasta Fasta
A0A8I5KX85 TAPVQAPPAP 148 TAPVQAPPAP xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 AIKIQLDNQY 239 ALKLQLDNQY xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 NQAIKLQLDN 237 NQALKLQLDN xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx
A0A8I5KX85 FPSIQSTAKH 199 FPSLQSTAKH xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx

The first one is correct, the others are not.
FYI, Start refers to the starting positon excluding the xxxxx

@fcyu
Copy link
Member

fcyu commented Jul 18, 2024

The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.

Best,

Fengchao

@pisistrato
Copy link
Author

pisistrato commented Jul 18, 2024 via email

@fcyu fcyu transferred this issue from Nesvilab/FragPipe Jul 18, 2024
@fcyu
Copy link
Member

fcyu commented Jul 18, 2024

Yes, this is a known bug: #430

@fcyu fcyu removed their assignment Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants