Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add R_c (convergence fraction) convergence analysis #104

Closed
orbeckst opened this issue Jul 16, 2020 · 10 comments · Fixed by #239
Closed

add R_c (convergence fraction) convergence analysis #104

orbeckst opened this issue Jul 16, 2020 · 10 comments · Fixed by #239

Comments

@orbeckst
Copy link
Member

Add the analysis of converging fraction R_c from the paper

S. Fan, B. I. Iorga, and O. Beckstein. Prediction of octanol-water partition coefficients for the SAMPL6- log P molecules using molecular dynamics simulations with OPLS-AA, AMBER and CHARMM force fields. Journal of Computer-Aided Molecular Design, 34:543–560, 2020. https://doi.org/10.1007/s10822-019-00267-z

cc @VOD555

@orbeckst
Copy link
Member Author

For other convergence analysis see http://getyank.org/latest/algorithms.html

@xiki-tempula
Copy link
Collaborator

@orbeckst Thanks for this issue. I'm interested in this way of checking convergence as well. I wonder if you have any code snippet or library that we could use?

@VOD555
Copy link
Contributor

VOD555 commented Aug 15, 2022

convergence.zip
@xiki-tempula Here's the script we used in SAMPL6 and SAMPL7 challenges.

@xiki-tempula
Copy link
Collaborator

@VOD555 Thank you for the script. I have a question with regard to the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397498/#FD17

I wonder how do you compute the equation 17 and 18?

Is the Ac actually being used? Thank you.

@orbeckst
Copy link
Member Author

Quick comment (@VOD555 should be able to provide more details):

  1. In order to compute the cumulative prob. distribution C(Rc) (eq 17), you take the Rc for each lambda window (let's call them Rc_lambda, and then count for a range of values 0 ≤Rc ≤ 1 (e.g. evenly spaced), how many Rc_lambda ≤ Rc. Divide all the counts by the total number of lambda windows. (There might be a smart way to do this with cumulative sums and @VOD555 should be able to point you to actual code.) You can also create a function from heaviside step functions, something like
    def C_Rc_func_factory(Rc_lambdas):
         def f(Rc):
              return np.sum([np.heaviside(Rc - Rc_lambda, 0.5) for Rc_lambda in Rc_lambdas]) / len(Rc_lambdas)
         return f
    Not tested...
  2. We used Ac in Fig 2, and Tables 2-5.

@orbeckst
Copy link
Member Author

P.S.: Fig 4 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667952/ should make clearer, what 𝒞(𝑅𝐶) looks like :

Cumulative distribution functions 𝒞(𝑅𝐶) of the convergence time fraction Rc for Coulomb and VDW λ windows

@xiki-tempula
Copy link
Collaborator

@orbeckst Thanks. I know this might sound like a reviewer question.
Why do you "Divide all the counts by the total number of lambda windows."
Instead of just taking the average of the 𝑅𝐶 or (1-𝑅𝐶) across all lambda windows?

@orbeckst
Copy link
Member Author

orbeckst commented Aug 17, 2022

I wrote down one way in which one can calculate the cumulative probability function that you see in the figure. I am not sure that you can do this with a simple average.

Why do you "Divide all the counts by the total number of lambda windows."

So that my $\mathcal{C}(R_c)$ is a number between 0 and 1.

(If I am making a mistake somewhere here I hope @VOD555 will correct me.)

We initially wanted a quantity that told us something about the distribution of Rc values. The distribution itself says something about individual values (but we didn't really use it). The cumulative distribution seemed important to us as a measure that assesses all windows together because all lambda values are used for a single free energy estimate. The Ac was then just a way to condense the C(Rc) into a single number.

I don't think that the $A_c$ is equal to the average over all the $R_c$ (or over $1 - R_c$): The average is

$$ \langle R_c \rangle = \int_0^1 dR_c p(r_c) R_c $$

where $p(R_c)$ is probability to observe $R_c$ over all $\lambda$.

On the other hand,

$$ \begin{align} \mathcal{C}(R_c) &= \int_0^{R_c} dR_c' p(R_c')\\ A_c &= \int_0^1 dR_c \mathcal{C}(R_c) \end{align} $$

so it's clear that $A_c \neq \langle R_c \rangle$. They measure different things.

Sorry if I misunderstood your question .... please feel free to ask again.

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Sep 22, 2022

Thanks. Some small questions.
Did you run it on only dHdl or the relevant column on the u_nk as well?
Did you run the R_c on the original dataset or the dataset after equilibrium detection?

@VOD555
Copy link
Contributor

VOD555 commented Sep 22, 2022

In our SAMPL7 work we only run it on dHdl, but in principle, one can run it on other column on the u_nk.
We run the R_c after equilibrium detection.

orbeckst added a commit that referenced this issue Oct 31, 2022
* Fix #104
* add @VOD555 's fractional equilibration time analysis for convergence (Fan et al 2021)
* reimplemented the R_c (based on code provided by @VOD555) and the A_c based on my understanding of the paper
* improved performance of original implementation by doing initial block average for "precision"-size blocks
* add docs with example graph
* add tests
* update CHANGES

Co-authored-by: Zhiyi Wu <zwu@exscientia.co.uk>
Co-authored-by: Shujie Fan <gn8008@hotmail.com>
Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants