-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: ims model #98
Conversation
8c2ac76
to
0a97dce
Compare
This is a plot from the Tenzer lab showing the precursor intensity (top cloud is z=1, second cloud is mostly z=2 and that tail to the right is mostly z3+)
LMK what you think! |
In that case, let's continue reporting IM then - I think we can (and should) try improving the model to see if it works better. Once we've tried that, we can attempt to unify the models (or at the very least, feature generation). I did some more reading on CCS prediction (I know very little about IM/CCS Pass one: we fit a linear regression to IM ~ m/z (or CCS ~ m/z) Inspired by: https://github.com/theGreatHerrLebert/ionmob#getting-insight-into-driving-factors-of-ccs |
Improving the model:Just as a reference, it seems like the upper theoretical bound of prediction accuracy would be r2=0.998[ccs] (measurement error) as of Fig3 in: https://doi.org/10.1038/s41467-021-21352-8 Complex models, (BILSTM in that paper) achieve r2=0.992[ccs] and (lgbm with VERY SIMPLE features https://github.com/TalusBio/flimsay) r2=0.9908[ccs] (0.9845[1/k0], 0.987948[1/k0] all features I could think of :P). I tried the "boosted" model and it does not seem to improve dramatically anything on my data jspaezp@193bbbd Calculating CCS:According to doi: 10.1074/mcp.TIR118.000900 Which in my attempt to do some math ... I am still to figure out the dimensional analysis that would simplify to a CCS whose units are BUT it would be a pretty direct equivalence to a scaled version of the also ... based on this figure I am not sure if our residuals off the prediction of mz + charge will be all that much better. |
Interesting that there isn't that much improvement - idk how much CCS/IM prediction should affect rescoring though, so maybe it's OK? |
56392af6ff48797295ad8cb38296c2f238420c3d <- tried the CCS conversion and it seems to do a hair worse than the raw 1/k0. My conclusions atm:
Future direction:
thanks a lot for the feedback! |
I think we should keep it - I can mess around with unification of feature generation too. I want to run a couple more tests, then I will merge! |
Hey guys, I just came across this feature request and think it's a really cool idea! :) I wanted to quickly share some information that might help you further improve it. I’ll also discuss what to expect from re-scoring with CCS (Collision Cross Section) features, as we've put a lot of work into this over the past two years. Understanding CCS vs. 1/K0While it may seem obvious, it's worth reminding that CCS is indirectly calculated from measured raw 1/K0 values, and for that you always need to know the charge state of the ion. This is just a practical point. Limitations of the Mason-Schamp EquationWhen translating 1/K0 to CCS using the Mason-Schamp equation, consider these limitations: Firstly, this equation is only effective for low-field devices. Secondly, the setup of the ion mobility (IM) device, such as the TIMS (Trapped Ion Mobility Spectrometry), significantly influences the accuracy of converting inverse mobilities to CCS for singly charged ions. The default settings might not be ideal for accurate results and CCS values are often incorrect for online available training data. Factors Influencing CCS Translation with the MSEThe Mason-Schamp equation also factors in the drift gas mass, which varies between devices and should be an input argument in your translation functions. Gas pressure and temperature of the drift gas, typically not controlled during the experiment, also play a role. The default values from our repository align with our lab conditions, so recalibration of predicted CCS values is necessary for practical application. Here, SAGE could be particularly useful in recalibration using high-confidence identified peptides. CCS Values in Re-scoringWe've also concluded that CCS values don't significantly enhance re-scoring likely because IM, unlike retention time, is highly correlated with the mass of the ion. However, significant improvements in re-scoring for singly charged ions were observed in immunopeptidomics, indicating that CCS's utility in re-scoring is dependent on the sample and acquisition context. I hope this information is helpful. Best, David |
0a97dce
to
d989de2
Compare
added mobility to outputs and model details simplified model removed redundant logging changed feature engineering removed old comments removed resolved comment
d989de2
to
a7414c8
Compare
This PR adds two main things.
Discussed here: #73
LMK what you think!
Best
-Sebastian