NAMD parsers #7

dotsdl · 2017-02-06T20:58:56Z

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:

Extracting reduced potentials u_nk from output files (for FEP).
Extracting derivatives DHdl from output files (for TI).

This issue is the nexus for discussion for such parsers for the NAMD package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

The text was updated successfully, but these errors were encountered:

davidlmobley · 2017-02-09T00:48:16Z

I'm going to ask Christophe Chipot about this. @Limn1 in my group also has some basic parsers which might be good enough/serve as a starting point/provide a skeleton.

davidlmobley · 2017-02-15T00:57:27Z

Chipot is unfortunately unable to help at present, he reports. @Limn1 may be able to help but is tied up with some stuff on deadline til early March, I expect.

davidlmobley · 2017-11-28T01:21:17Z

I talked to @vtlim about this today. She could use some of her existing data, but we also batted around the idea of generating some sample data for a simpler test system.

orbeckst · 2017-11-28T01:24:51Z

Good, feel free to grab alchemistry/alchemtest#17 !

mrshirts · 2017-11-28T03:07:36Z

There's a data set from JC Gumbart for the Tyr->Ala example from the NAMD FEP tutorial that he gave me. The data files are at http://simbac.gatech.edu/forMichael/NAMD-FEP-tutorial-TYR2ALA.tgz

If these look useful, then I can contact him about the requisite permissions.

vtlim · 2017-11-28T04:31:28Z

@mrshirts - if this is the same Tyr->Ala example as in this tutorial, I've played around with some of their example output a while back.

So, yes, it'd be helpful to see if we had permissions to use these.

orbeckst · 2017-12-05T19:22:50Z

I moved the discussion of tests datasets to alchemistry/alchemtest#17

orbeckst · 2017-12-12T00:38:50Z

@davidlmobley please assign this issue (and alchemistry/alchemtest#17) to @vtlim

vtlim · 2017-12-12T08:19:59Z

@orbeckst - Apologies for the delay; I had a proposal application due in past hour.
Also, I see mention of "reST" in alchemistry/alchemtest#17
but could not find mention of this term elsewhere in either repo. This may be a trivial question,
but can you clarify what it is?

orbeckst · 2017-12-12T08:36:33Z

On Dec 12, 2017, at 1:19 AM, Victoria Lim ***@***.***> wrote: @orbeckst <https://github.com/orbeckst> - Apologies for the delay; I had a proposal application due in past hour.

No problem, I just didn’t know if you got the github invite.

Also, I see mention of "reST" in [alchemistry/alchemtest#17 <alchemistry/alchemtest#17>]

restructured text, the doc markup format, as parsed by sphinx http://www.sphinx-doc.org/en/stable/rest.html

…

but could not find mention of this term elsewhere in either repo. This may be a trivial question, but can you clarify what it is? (alchemistry/alchemtest#17 <alchemistry/alchemtest#17>)

davidlmobley · 2017-12-12T16:43:35Z

@orbeckst - we've not used Sphinx yet, so this is helpful. yet another argument for a general "best practices for code development" writeup. :)

orbeckst · 2018-01-17T20:54:26Z

@vtlim the NAMD data set is now in alchemtest – thank you for making this work.

I tentatively added this feature to the 0.2.0 release but if you are pretty busy at the moment then we can easily move it to 0.3.0. We can release as quickly as we like.

vtlim · 2018-01-31T20:09:25Z

@orbeckst -- can you clarify what the fep-lambda column means in the pandas dataframe? I was reading the docs here but it's not clear to me what that column is for. I'm not sure if I should add that column after parsing the output file in namd.py or elsewhere.

orbeckst · 2018-01-31T23:03:30Z

The html-formatted docs are nicer: https://alchemlyb.readthedocs.io/en/latest/parsing.html

I don't know what the NAMD data look like. But if it comes from any kind of free energy calculation then there should be either (for thermodynamic integration, TI) dH/dlambda derivatives (the dHdl standard form or for FEP/BAR/MBAR kind of calculations, potential energies evaluated at different lambdas (the u_nk standard form).

Do you know what kind of calculations were done in the NAMD case?

If it is dHdl then "fep-lambda" would be a generic name for the lambda value at which dHdl was sampled. For Gromacs, one has typically multiple lambda values (coul-lambda, vdw-lambda, maybe restr-lambda, ...) but for the Amber reader they are just called "lambdas"

alchemlyb/src/alchemlyb/parsing/amber.py

Line 36 in e5b2e58

df["lambdas"] = data_dic["lambdas"][0]

vtlim · 2018-02-01T17:16:34Z

The NAMD case I am working on is the FEP-based estimator (the extract_u_nk function).

Based on the link you sent:

For datasets that sample only a single λ parameter, then the DataFrame will feature only a single index in addition to time, with the values of λ for which reduced potentials were recorded given as column labels.

My understanding of the above was that the different columns represent the lambda value of sampling, but that would mean the fep-lambda column is doing something else since it has values of both 0 and 1 in the example in the docs.

orbeckst · 2018-02-01T23:08:50Z

@vtlim I am looking at the test data. But while doing so, I found that we overlooked something in the test data accessor. I pushed PR alchemistry/alchemtest#29 – if you have a moment could you review the PR, please? Thanks.

I'll say more about u_nk when I had a look at the data... hopefully in a few minutes.

orbeckst · 2018-02-02T00:05:51Z

For datasets that sample only a single λ parameter, then the DataFrame will feature only a single index in addition to time, with the values of λ for which reduced potentials were recorded given as column labels.

My understanding of the above was that the different columns represent the lambda value of sampling, but that would mean the fep-lambda column is doing something else since it has values of both 0 and 1 in the example in the docs.

The DataFrame should contain all data needed to run BAR/MBAR or one of the FEP estimators (which we don't have implemented yet). The "fep-lambda" index states at which lambda this particular frame was sampled. If you have one output file at lambda=0, one at lambda=0.5, and one at lambda=1 then your parser needs to read all the data files and join them together in one big DataFrame. The timesteps (or trajectory "frames") are each tagged with the lambda at which they were sample, i.e., data from the lambda=0.5 file will all have fep-lambda 0.5.

The columns are the evaluations of the Hamiltonian (or the potential energy U) at other lambdas (sometimes called "foreign lambdas"): take positions sampled at fep-lambda lambda0, x_lambda0 and evaluate the energy function U(x, lambda=lambda1) with the lambda set to foreign lambda1: U(x=x_lambda0(t), lambda=lambda1) — this gives you row lambda0 with column lambda1 (at time t).

I am looking at the data (the following needs PR alchemistry/alchemtest#29 to work):

>>> import alchemtest.namd
>>> import bz2
>>> t = alchemtest.namd.load_tyr2ala()
>>> fwd = bz2.open(t.data['forward'][0], mode="rt").readlines()
>>> print("".join(fwd[0:5]))
#            STEP                 Elec                            vdW                    dE           dE_avg         Temp             dG
#                           l             l+dl             l            l+dl         E(l+dl)-E(l)
#NEW FEP WINDOW: LAMBDA SET TO 0 LAMBDA2 0.05
FepEnergy:     10      -5391.7740     -5390.5907       533.4814       532.2565        -0.0415        -0.1133       302.6836        -0.1161
FepEnergy:     20      -5281.7946     -5280.8118       452.4335       451.4536         0.0029        -0.0665       298.4974        -0.0702

It looks to me as if we have a single lambda (as you say), which has the value lambda0=0. I am not quite sure how to interpret LAMBDA2 and the l+dl; I am assuming that l+dl == LAMBDA2. In my notation then you would apparently have Elec and vdW evaluated at lambda0 (l) and lambda1 (l+dl). I don't know if the sum of Elec and vdW is the full change in energy.

I think the question is how do you normally process the NAMD data to generate the free energies?

vtlim · 2018-02-02T22:55:04Z

If you have one output file at lambda=0, one at lambda=0.5, and one at lambda=1 then your parser needs to read all the data files and join them together in one big DataFrame.

NAMD joins the different lambda windows in its standard routine. Here's a web example of a NAMD output file. You can see that new windows are declared by "#NEW FEP WINDOW: LAMBDA SET TO ..." I don't have an additional set of lambda values other than these.

I am assuming that l+dl == LAMBDA2. In my notation then you would apparently have Elec and vdW evaluated at lambda0 (l) and lambda1 (l+dl).

Yes, that should be the case for both statements.

I don't know if the sum of Elec and vdW is the full change in energy.

Correct. Essentially, the dE column is equivalent to elec(l+dl) - elec(l) added to vdW(l+dl) - vdW(l).

I think the question is how do you normally process the NAMD data to generate the free energies?

We have two fepout files from NAMD: one representing the forward work of transformation, and one representing the reverse work. For each lambda window, (e.g., 0 to 0.05), we:

subsample fwd and rev based on statistical efficiency (done with the timeseries module of pymbar )
apply the BAR estimator to get the dG value
Then net dG is the sum of the windows.

I can share the code I am in the process of adapting and paste the example output from the data I put in alchemtest.

vtlim · 2018-02-03T01:55:03Z

The DoBAR function in this script does what I discussed above.

vtlim · 2018-02-03T01:56:01Z

Applying the code I shared above on the example that is in alchemtest:
ex1_180126.txt

orbeckst · 2018-02-06T17:32:56Z

From what I can gather from what you posted is that you should assemble the data into a DataFrame that roughly looks like the following

                                   0.0           0.05             0.1            0.15          ......
time  fep-lambda

0.0       0.0                 xxx               yyy              0                0
1.0       0.0                 xxx               yyy              0                0
...
...
0.0       0.05                 0                xxx            yyy              0             
1.0       0.05                 0                xxx            yyy              0             
...
...
0.0       0.1                 0                    0           xxx             yyy                   
1.0       0.1                 0                    0           xxx             yyy                      
...
...

I think this should be for the total E (because that would be the u_nk) but I'd be happy if @mrshirts or @davidlmobley would chime in.

We could either have a function that also extracts the Elec and VDW components into the same format. Or one could add additional multi-indices to this DataFrame but it seems to me that this would break the overall API ( @dotsdl ?)

davidlmobley · 2018-02-06T18:42:30Z

That sounds right, I think. But @vtlim - you basically have this already working with BAR in your existing analysis, right? So you should be able to take a file you can analyze there, format it to analyze with alchemlyb, and check that you get the same answers... It should be using the same code (BAR/MBAR) for analysis, so the differences should just be in terms of how the data flow is organized.

vtlim · 2018-02-08T00:46:09Z

@orbeckst - BAR is not yet in alchemlyb, is it? I don't see it in the docs nor in my conda installation of alchemlyb estimators. I reassembled my DataFrame, but it's not suited for MBAR.
The sum of all N_k must equal the total number of samples (length of second dimension of u_kn.)

In other words, I have 20,000 samples based on 20,000 time steps, and this is not equal to the number of windows (20).

orbeckst · 2018-02-08T01:13:29Z

The missing BAR estimator #28 is one of the major open issues.

I don't think that the MBAR estimator requires you to have as many foreign lambdas as samples. Does this error (is it an error message?) come from alchemlyb or from pymbar?

davidlmobley · 2018-02-08T01:27:13Z

@vtlim could you say a little more about how you reassembled your data frame? Also, the traceback of the error message would probably help.

I think the error message is telling you that you have $N_k$ for some lambda which is not equal to the number of samples at that lambda value, e.g. at some lambda value you have $N_k$ set to, say, 15,345 but the number of snapshots provided is 20,000 (or some such).

vtlim · 2018-02-08T02:09:15Z

The error message above came from pymbar. The error is raised here.

I think the error is actually not an issue with mbar but with my dataframe somehow, triggered by somewhere around here. Right now, my code passes the wrong value for N_k which is why MBAR is unable to work.

        u_nk = u_nk.sort_index(level=u_nk.index.names[1:])

        groups = u_nk.groupby(level=u_nk.index.names[1:])
        N_k = [(len(groups.get_group(i)) if i in groups.groups else 0) for i in u_nk.columns]

For the Gromacs data I'm testing, it returns N_k of [4001, 0, 0, 0, 0] because there are 4001 samples and 5 sets of lambda values. For the NAMD data I'm testing, I have 1001 samples and 20 lambda windows. Here, N_k is [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]. I would expect the first item in that list to be 1001.

Here's the head of the Gromacs dataframe:

                         0.00       0.25       0.50       0.75       1.00
time    fep-lambda                                                       
0.0     0.0         13.257467  16.496997  19.736528  22.976059  26.215589
10.0    0.0          9.232471  11.465868  13.699265  15.932662  18.166059
20.0    0.0          5.423357   6.706389   7.989420   9.272451  10.555482

Here's the head of the NAMD dataframe:

                         0    0.05     0.1    0.15     0.2    0.25     0.3  \
time    fep-lambda                                                           
10000.0 0.0        -0.2674  0.5577  0.4597  0.5728  0.2116  0.5174  0.1851   
10010.0 0.0        -0.1513  0.2481  0.2725  0.2319  0.5871  0.5013  0.1299   
10020.0 0.0         0.1559 -0.1338  0.3915  0.1978  0.5235  0.6366  0.1526   
10030.0 0.0         0.1265  0.1149  0.5136 -0.6523  0.4662  0.4138  0.4163

I'm not sure where the difference lies right now, but it must have something to do with how I created the pandas dataframe. I printed out the groups variable of the code above N_k, and both look analogous, having levels of (1) the time, (2) sequential integers, (3) zeroes with each level the same length as number of samples. I'm relatively new to pandas, so maybe there is something I'm not seeing. Any suggestions on where to look?

vtlim · 2018-02-08T03:03:13Z

I found the issue. The lambda values of the column headers have an inconsistent number of decimal places as printed out by NAMD. That means the 0 of the column header doesn't match exactly the 0.0 of the fep-lambda column. Right now I have a workaround to list the column headers and the fep-lambda column to be of two decimal points and of the same data type. How I end up fixing that workaround depends on how I want to structure the final dataframe (which should depend on the BAR implementation, I think).

The dataframe I printed out above is technically not correct since I populated by the column lambdas instead of the fep-lambda. The correct version (which I have also tested successfully with MBAR) looks more along the lines of what @orbeckst said earlier (but I have the yyy values as zero):

                       0.00  0.05  0.10  0.15  0.20  0.25  0.30  0.35  0.40  \
timestep fep-lambda                                                           
10000.0  0.00       -0.2674   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   
10010.0  0.00       -0.1513   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   
10020.0  0.00        0.1559   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   
10030.0  0.00        0.1265   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0

The reason that I think this should depend on how BAR is implemented is that it should take in a list of forward and reverse work values. This last dataframe I'm showing has 20020 rows compared to 1001 rows, and most of them are zero. It seems to me that the first format I posted would make more sense, but that would mean that the lambdas take on a different meaning in this application -- which I'm not sure would be consistent with the API.

mrshirts · 2018-03-25T23:48:20Z

What's the current status on this, @vtlim ? Anything that needs looking at?

vtlim · 2018-03-26T05:46:20Z

@mrshirts -- I have code takes the NAMD output and formats it into a pandas dataframe consistent with the existing API. The dataframe works with MBAR but I'm not sure if the output values are correct (I'm not sure how valid it is to put this dataset into MBAR. Though I realize that MBAR reduces to BAR in the limit of sampling just two states.) As of now, having the BAR estimator (#28) would be ideal for me to wrap this up.

mrshirts · 2018-03-26T06:07:32Z

Thanks for letting me know that BAR is a limiting issue now. Yes, one could examine it by simply running BAR on all pairwise states; that will give free energies that are equivalent to BAR.

orbeckst · 2019-01-11T21:05:43Z

@vtlim we now have a BAR estimator (issue #28) in the development version. Any chance you can pick this one up again?

vtlim · 2019-01-14T04:54:22Z

@orbeckst - I'm not sure I'll be able to work on the NAMD parser soon due to file system rearrangements + operating system updates on our local cluster and also having not worked on this project in a while. So I can work on it, but probably not anytime soon.

dotsdl added help wanted parsers labels Feb 6, 2017

orbeckst mentioned this issue Nov 28, 2017

include NAMD dataset alchemistry/alchemtest#17

Closed

4 tasks

orbeckst assigned davidlmobley Nov 28, 2017

orbeckst removed the help wanted label Nov 28, 2017

orbeckst assigned vtlim Dec 12, 2017

orbeckst added this to the release 0.2.0 milestone Jan 16, 2018

orbeckst removed this from the release 0.2.0 milestone Feb 16, 2019

orbeckst mentioned this issue Feb 16, 2019

new release 0.2.0 #53

Closed

orbeckst added the help wanted label Feb 16, 2019

vtlim mentioned this issue Feb 28, 2019

NAMD parser for FEP files #72

Merged

orbeckst removed the help wanted label Mar 18, 2019

orbeckst added this to the release 0.2.0 milestone Mar 18, 2019

orbeckst closed this as completed in 4ed712e Mar 21, 2019

orbeckst mentioned this issue Apr 14, 2019

NAMD parser does not take temperature as argument #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAMD parsers #7

NAMD parsers #7

dotsdl commented Feb 6, 2017

davidlmobley commented Feb 9, 2017

davidlmobley commented Feb 15, 2017

davidlmobley commented Nov 28, 2017 •

edited

Loading

orbeckst commented Nov 28, 2017

mrshirts commented Nov 28, 2017

vtlim commented Nov 28, 2017 •

edited

Loading

orbeckst commented Dec 5, 2017

orbeckst commented Dec 12, 2017

vtlim commented Dec 12, 2017 •

edited

Loading

orbeckst commented Dec 12, 2017 via email

davidlmobley commented Dec 12, 2017

orbeckst commented Jan 17, 2018

vtlim commented Jan 31, 2018

orbeckst commented Jan 31, 2018

vtlim commented Feb 1, 2018

orbeckst commented Feb 1, 2018

orbeckst commented Feb 2, 2018

vtlim commented Feb 2, 2018 •

edited by orbeckst

Loading

vtlim commented Feb 3, 2018

vtlim commented Feb 3, 2018

orbeckst commented Feb 6, 2018 •

edited

Loading

davidlmobley commented Feb 6, 2018

vtlim commented Feb 8, 2018

orbeckst commented Feb 8, 2018

davidlmobley commented Feb 8, 2018

vtlim commented Feb 8, 2018 •

edited

Loading

vtlim commented Feb 8, 2018 •

edited

Loading

mrshirts commented Mar 25, 2018

vtlim commented Mar 26, 2018

mrshirts commented Mar 26, 2018

orbeckst commented Jan 11, 2019

vtlim commented Jan 14, 2019 •

edited

Loading

NAMD parsers #7

NAMD parsers #7

Comments

dotsdl commented Feb 6, 2017

davidlmobley commented Feb 9, 2017

davidlmobley commented Feb 15, 2017

davidlmobley commented Nov 28, 2017 • edited Loading

orbeckst commented Nov 28, 2017

mrshirts commented Nov 28, 2017

vtlim commented Nov 28, 2017 • edited Loading

orbeckst commented Dec 5, 2017

orbeckst commented Dec 12, 2017

vtlim commented Dec 12, 2017 • edited Loading

orbeckst commented Dec 12, 2017 via email

davidlmobley commented Dec 12, 2017

orbeckst commented Jan 17, 2018

vtlim commented Jan 31, 2018

orbeckst commented Jan 31, 2018

vtlim commented Feb 1, 2018

orbeckst commented Feb 1, 2018

orbeckst commented Feb 2, 2018

vtlim commented Feb 2, 2018 • edited by orbeckst Loading

vtlim commented Feb 3, 2018

vtlim commented Feb 3, 2018

orbeckst commented Feb 6, 2018 • edited Loading

davidlmobley commented Feb 6, 2018

vtlim commented Feb 8, 2018

orbeckst commented Feb 8, 2018

davidlmobley commented Feb 8, 2018

vtlim commented Feb 8, 2018 • edited Loading

vtlim commented Feb 8, 2018 • edited Loading

mrshirts commented Mar 25, 2018

vtlim commented Mar 26, 2018

mrshirts commented Mar 26, 2018

orbeckst commented Jan 11, 2019

vtlim commented Jan 14, 2019 • edited Loading

davidlmobley commented Nov 28, 2017 •

edited

Loading

vtlim commented Nov 28, 2017 •

edited

Loading

vtlim commented Dec 12, 2017 •

edited

Loading

vtlim commented Feb 2, 2018 •

edited by orbeckst

Loading

orbeckst commented Feb 6, 2018 •

edited

Loading

vtlim commented Feb 8, 2018 •

edited

Loading

vtlim commented Feb 8, 2018 •

edited

Loading

vtlim commented Jan 14, 2019 •

edited

Loading