-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on uncertainty estimates for dataset metrics #22
Comments
Hi @jmichel80 and @ppxasjsm , please find three csv files containing the experimental and calculated affinities for the Thrombin dataset reported in the specified paper. R is reported to be 0.71 +/- 0.24 ("Errors for [...] R values by use of the bootstrapping method are also reported.") R is 0.706, the R confidence as calculated by freenrgworkflows is about 0.128 < 0.282 < 0.423 (i.e. 0.282 +/- 0.154 or 0.140) when taking into account the "cycle errors" or 0.102 < 0.264 < 0.411 (i.e. 0.264 +/- 0.163 or 0.146) when assuming an error of 1.1 kcal/mol per perturbation. Any feedback on what may be different about the bootstrapping performed in freenrgworkflows and the FEP+ paper is greatly appreciated. |
First thing I noticed is that you are using absolute and not relative free energies. They are fine to use for R but not other metrics. I'll write a method that will support reading in absolute free energies. As for the other issues, I think the behaviour I am still debugging. |
Thanks for the quick feedback. I forgot to mention that I am generating the files containing the relative binding free energies on the fly. Hence, we do not necessarily need a method implemented in freenrgworkflows to handle that. |
So from my unerstanding of reading the SI the analysis I implemented and the one they suggest are very similar. You end up subsampling according to a gaussian distribution, both the experimental and computed values (I in fact do not experimental variation into account.) with sigma set to 1.1 kcal/mol for computed free energies and 0.4 kcal/mol for experimental free energies. So my algorithm looks like this:
When I do this for one of your example datasets I get the following: I would naively assume, oh...why is R so much worse after the subsampling, there must be something wrong (hence the question). Actually there is a paper that answers this quite well. What I suspect has happened is, that they computed a correlation coefficient of the original data and then the standard deviation of the subsampled data. The standard deviation is around 0.25, which is exactly what they report, it just so happens that the mean of the data set due to the introduced error gets drastically shifted, this is what freenrgworkflows can report. I have attached my calculations for you to take a look. Essentially what I am advocating is that:
The only thing is that I may misunderstand how to sample from these Gaussian distributions, but this was pretty much code copied from a script @jmichel80 had given me a long time ago. |
Thanks for the detailed feedback and clarification! I get the same results using my script. However, it does not completely match the numbers provided in the paper: If I understand the SI correctly, the approach above (assuming errors of 1.1 and 0.4 kcal/mol) is reported as "anticipated FEP R-value" (second last line of the table). When running the uncertainty estimation (fifth cell in your notebook) 100 times, the means of R and std are 0.34 +/- 0.26 which is slightly different from the reported 0.37 +/- 0.26. Furthermore, please note that there is a small difference in the standard deviation of the "observed R-value FEP" (9th line) and the "anticipated FEP R-value" (Thrombin: 0.24 vs 0.26, Tyk2: 0.07 vs. 0.10). Hence, I assume that
|
What I implemented in the notebook is what the SI says as far as I understand. I am not really sure what their DG_i value is other than the one they report, i.e. representing the mean of the distribution. I'll give it another careful read. The data you gave me is the FEP+ data? |
Hi Max,
- The small differences you see may be due to the number of repeats you carried out. Try running 1000 times to see if that changes anything.
I think there is some terminology confusion in the thread.
- Toni's code, which is based on some old scripts I wrote, which were based on the above mentioned Brown et al. paper, resample the distributions (not subsample).
- There is no bootstrapping going on, the same population is used in every experiment. If bootstrapping was used each resampling would have a different composition of ligands and R would fluctuate even more (this is the approach used in SAMPL/D3R analyses).
Some other comments.
-Based on the SI of Wang it is possible that the experimental data is resampled with sigma = 0.4 kcal/mol for the purpose of determining the maximum plausible R. And the computed FEP data is resampled with 1.1 kcal/mol for determining the uncertainty on R, but using a fixed set of experimental values which would introduce less noice than perturbing the x values by sigma=1.1 kcal/mol and the y-values by 0.4 kcal/mol.
- I agree with Toni that it makes sense that the mean R drops if we introduce random noise in an initial correlation
- I don't like the idea of using arbitrarily 1.1 kcal/mol because it folds in systematic (model) errors into a metric that I would prefer to interpret as the likelihood that I will get a similar result if I repeat the calculations. If we plug in statistical uncertainties into the model instead we get an uncertainty estimate closer to the latter interpretation (subject to the DDG uncertainties being robust of course).
Best wishes,
Julien
…--------------------------------------------------------------
Dr. Julien Michel,
Senior Lecturer
Room 263, School of Chemistry
University of Edinburgh
David Brewster road
Edinburgh, EH9 3FJ
United Kingdom
phone: +44 (0)131 650 4797
http://www.julienmichel.net/
-------------------------------------------------------------
On Thu, Oct 24, 2019 at 12:39 PM Maximilian Kuhn <notifications@github.com<mailto:notifications@github.com>> wrote:
Thanks for the detailed feedback and clarification! I get the same results using my script.
However, it does not completely match the numbers provided in the paper:
If I understand the SI correctly, the approach above (assuming errors of 1.1 and 0.4 kcal/mol) is reported as "anticipated FEP R-value" (second last line of the table). When running the uncertainty estimation (fifth cell in your notebook) 100 times, the means of R and std are 0.34 +/- 0.26 which is slightly different from the reported 0.37 +/- 0.26.
Testing the same script on Tyk2 gives 0.62 +/- 0.13 which is significantly different from the reported 0.74 +/- 0.10. (input files: issue22_tyk.zip<https://github.com/michellab/freenrgworkflows/files/3767232/issue22_tyk.zip>)
Furthermore, please note that there is a small difference in the standard deviation of the "observed R-value FEP" (9th line) and the "anticipated FEP R-value" (Thrombin: 0.24 vs 0.26, Tyk2: 0.07 vs. 0.10).
Hence, I assume that
* we are not calculating the "anticipated FEP R-value" correctly and
* the bootstrapping for the "observed" values is performed slightly differently and not according to the method described in the SI.
However, in both cases I have no idea where the difference might be coming from. Any thoughts on this? @ppxasjsm<https://github.com/ppxasjsm> @jmichel80<https://github.com/jmichel80>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#22>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACZN3ZGE2MYNTGM6YMRYNL3QQGCNJANCNFSM4JD756GA>.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
|
What I don't understand is equation 1.1 in the SI. Is this just the condition that the overall sum of free energies experimentally known and computed must be the same? |
@ppxasjsm I agree that the content in your notebook should reproduce the results outlined in the SI. As we both wrote some code independently after reading the SI and do get the same result, we might well both be overlooking something. The data is taken from the excel sheet supplied in the SI. Regarding your other comment, yes, they reweigh the computed results so that the sums of the experimental and computed results are equal. |
Mmmh, so we are using the DGs they provide and are generating new samples based on the Gaussian distributions they are using... |
@jmichel80 Thanks for your suggestions. I have been running it using 1000 repeats now, but the mean and stdev do not change. I find this really strange... Assessing the uncertainty of R in the way you suggest certainly makes sense. However, reporting the actual R value with the standard deviation of the subsampled R value does not seem completely correct to me, and I have not really felt too confident doing that. What's your view on that? I am also not a huge fan of the 1.1 kcal/mol estimate, especially since some perturbations are clearly easier than others. I was okay with this approach if I would expect all perturbations to be of similar difficulty (i.e. the same type of transformation). For example, this overestimates the difficulty for the Thrombin dataset, as there are only fairly easy perturbations. @ppxasjsm It may sound dumb, but would you mind extracting the data for thrombin and tyk2 from the SI yourself and run your script on it, just to ensure I am not messing something up at that early stage? |
I get the same paper prediction for Tyk2 0.89 is the direct correlation coefficient of exp v computed. The std I get from subsampling is slightly larger than the 0.07, but the correlation is the same. I also don't think we should use 1.1 kcal/mol as a general error for computed results there are for sure some datasets that are more uncertain than that and the variability from repeats seems a better indicator. I don't think you should report R on the raw data and std from the subsampling that is misleading. Though using the subsampling you are going to get a seemingly worse correlation. Reporting them separately may be useful, but the most meaningful value is really the subsampled mean±std or confidence intervals since the distribution is not perfectly Gaussian. I can extract the data and do a consistency check yes, will do that now and get back to you. |
So I had another very close look at this and looked at all the FEP+ compounds. I can regenerated the straight up R value for all of them, but the resampled ones from the distributions I cannot get the right results. Though I also noticed somewhat inconsistent seeming results in the spreadsheet. See for example CDK2: R is: 0.48+-0.19, but supposedly Exp R FEP is 0.73+-0.11 which definitely seems off. Attached is my complete analysis of this, but the simple print out looks like this:
For the notebook to work, you'll have to change the file paths, but I guess that is kind of self explanatory. My suggestion in terms of proceeding:
I am at a bit of a loss to how exactly we are meant to reproduce the stated data with the information given, but I don't really see anything wrong with what we are doing. @jmichel80 What do you think? |
Ah in terms of freenrgyworkflows I can incorporate the sampling of the experimental uncertainties as well, which isn't currently done. Let me know if you want me to do a quick update of the code. |
Hi Toni, thanks again for your feedback, suggestions and testing the datasets. It's really strange we cannot reprocue this, especially as the results differ quite a lot (looking at the p38 dataset for example). As you suggested, I also tend to report R and the reported R and std separately. |
Hi @ppxasjsm from discussions with @maxkuhn it is also good to clarify
why the R_mean value from X resampled calculated free energies is always lower than the R value of the original dataset
Why the uncertainty estimates seem to be always larger than those reported in the FEP+ study. According to https://github.com/michellab/freenrgworkflows/blob/devel/networkanalysis/stats.py we seem to be reporting the 68% CI which should be similar to the +/- 1 sigma uncertainty estimated according to https://pubs.acs.org/doi/suppl/10.1021/ja512751q/suppl_file/ja512751q_si_001.pdf p9/10.
Yet @maxkuhn says he has processed FEP+ dataset with freenrgworkflows and obtained different R uncertainties even though similar DDG uncertainties were used. @maxkuhn please post your input with the commands you ran and the output you got to allow @ppxasjsm and myself to reproduce your findings.
The text was updated successfully, but these errors were encountered: