Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem with pymbar analysis. #87

Closed
9527567 opened this issue Jul 7, 2024 · 3 comments
Closed

The problem with pymbar analysis. #87

9527567 opened this issue Jul 7, 2024 · 3 comments

Comments

@9527567
Copy link
Contributor

9527567 commented Jul 7, 2024

Perhaps I should reopen an issue and describe it here.

@egallicc
Copy link
Collaborator

egallicc commented Jul 8, 2024

I have never used pymbar. However, here are some comments/suggestions:

  1. MBAR requires the bias energy to be expressed in kT units. For example:
def bias_fcn(epert, beta, lam1, lam2, alpha, u0, w0):
    # ALPHA, U0, W0COEFF
    # Initialize ebias1 with zeros
    ebias1 = np.zeros_like(epert)
    
    # Check if alpha is greater than 0
    if alpha > 0:
        ee = 1 + np.exp(-alpha*(epert - u0))
        ebias1 = (lam2 - lam1) * np.log(ee) / alpha
    
    return beta * ( ebias1 + lam2 * epert + w0 )

where beta = 1/(kB T). AToM's .out files express energies in kcal/mol units, so for T=300 K, beta = 1./(1.986e-3*300.0) for all samples and states.

  1. Multi-state free energy inference is based on calculating the bias energy of all samples at all states. I think your code does it only for the state in which the sample was collected. If the perturbation energy samples are collected in usamples, something like this would calculate the bias matrix u_kn required by pymbar.MBAR():
nstates = len(lambda1)
u_kn = np.zeros((nstates, len(usamples)))
for k in range(nstates):
      u_kn[k] = bias_fcn(usamples, bet[k], lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])

The second argument of pymbar.MBAR(), N_k is the number of samples collected at each state k. See below.

  1. The free energy analysis is performed for the two ATM legs separately and the binding free energy is the difference of the corresponding free energies. In your case, leg1's states correspond to DIRECTION > 0 (the first 11 states) and leg2's to DIRECTION < 0 (the second 11 states). So, for example, this will give you the lambda1's values for leg1:
lambda1 = np.array(config['LAMBDA1'], dtype=np.float64)[0:11]

And something like this would collect only leg1's samples from the .out files:

def read_u_energy(name: str, n_replicas: int = 22):
    usamples = []
    n_k = [ 0 for i in range(    int(n_replicas/2)  ) ]
    for k in range(n_replicas):
        lines = Path(f'r{k}/{name}.out').read_text().split('\n')
        for line in lines:
            values = line.split()
            if len(values) > 0:
                stateid = int( values[0] )
                direction = float( values[2] )
                if direction > 0:
                    usamples.append( float( values[9] ) )
                    n_k[ stateid ] += 1
    return ( np.array(usamples,dtype=np.float64), np.array(n_k,np.int32) )

Call it as:

(usamples, N_k) = read_u_energy(args.name, 22)

The DIRECTION value is stored in the third column. Only samples with DIRECTION > 0 are loaded. The function returns the perturbation energy samples and the number of samples collected at each state. Note that all replicas' .out files are scanned because leg1's samples can be found in any of the 22 replicas, even those started in leg2's states. Also, for production, consider discarding the first portion of the data for equilibration. For example, do for line in lines[20:] above to discard the first 20 samples of each replica.

  1. With the above, I think this would calculate the free energy difference for leg1:
mbar = pymbar.MBAR(u_kn = u_kn, N_k=N_k)
print( mbar.compute_free_energy_differences() )

@9527567
Copy link
Contributor Author

9527567 commented Jul 15, 2024

Thank you, I have modified the analysis code.

import numpy as np
import argparse
import sys
from configobj import ConfigObj
from pathlib import Path
import pymbar
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument('-n', '--name', required=True, help='name')
parser.add_argument('-c', '--config', required=True, help='config file')
parser.add_argument('-s', '--start', required=False,
                    help='start sample', default=0)
parser.add_argument('-e', '--end', required=False,
                    help='end sample', default=sys.maxsize)

args = parser.parse_args()


def bias_fcn(epert, beta, lam1, lam2, alpha, u0, w0):
    # ALPHA, U0, W0COEFF
    # Initialize ebias1 with zeros
    ebias1 = np.zeros_like(epert)

    # Check if alpha is greater than 0
    if alpha > 0:
        ee = 1 + np.exp(-alpha*(epert - u0))
        ebias1 = (lam2 - lam1) * np.log(ee) / alpha

    return beta * (ebias1 + lam2 * epert + w0)


def read_u_energy(name: str, n_replicas: int = 22, start: int = 0, end: int = sys.maxsize):
    lig1_usamples = []
    lig2_usamples = []
    n_k1 = [0 for i in range(int(n_replicas/2))]
    n_k2 = [0 for i in range(int(n_replicas/2))]

    for k in range(n_replicas):
        lines = Path(f'r{k}/{name}.out').read_text().split('\n')
        for line in lines[start: end]:
            values = line.split()
            if len(values) > 0:
                stateid = int(values[0])
                direction = float(values[2])
                if direction > 0:
                    lig1_usamples.append(float(values[9]))
                    n_k1[stateid] += 1
                else:
                    lig2_usamples.append(float(values[9]))
                    n_k2[stateid-11] += 1
    return (np.array(lig1_usamples, dtype=np.float64), np.array(lig2_usamples, dtype=np.float64), np.array(n_k1, np.int32), np.array(n_k2, np.int32))


def read_config(config_file: str):
    config = ConfigObj(config_file)
    return config


if __name__ == "__main__":
    config = read_config(args.config)

    lambda1 = np.array(config['LAMBDA1'], dtype=np.float64)
    lambda2 = np.array(config['LAMBDA2'], dtype=np.float64)
    lambdas = np.array(config['LAMBDAS'], dtype=np.float64)

    alpha = np.array(config['ALPHA'], dtype=np.float64)
    u0 = np.array(config['U0'], dtype=np.float64)
    w0 = np.array(config['W0COEFF'], dtype=np.float64)
    (lig1_usamples, lig2_usamples, N_k1, N_k2) = read_u_energy(
        args.name, 22, int(args.start), int(args.end))
    nstates = 11
    u_kn1 = np.zeros((nstates, len(lig1_usamples)))
    u_kn2 = np.zeros((nstates, len(lig2_usamples)))

    for k in range(nstates):
        u_kn1[k] = bias_fcn(lig1_usamples, 1./(1.986e-3*300.0),
                            lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])
    for k in range(11, 22):
        u_kn2[k-11] = bias_fcn(lig2_usamples, 1./(1.986e-3*300.0),
                               lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])
    u_nk1 = pd.DataFrame(u_kn1)
    u_nk2 = pd.DataFrame(u_kn2)

    mbar1 = pymbar.MBAR(u_kn=u_kn1, N_k=N_k1, verbose=True,
                        solver_protocol='robust')
    mbar2 = pymbar.MBAR(u_kn=u_kn2, N_k=N_k2, verbose=True,
                        solver_protocol='robust')

    a = mbar1.compute_free_energy_differences()
    b = mbar2.compute_free_energy_differences()
    print(a['Delta_f'])
    print(b['Delta_f'])

Furthermore, can we use alchemlyb for analysis? But I don't know how to create its input as follows.
image

@egallicc
Copy link
Collaborator

Thank you.

Sorry, I am not familiar with alchemlyb's input. Presumably, it would be related to the output of bias_fcn() as for MBAR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants