The problem with pymbar analysis. #87

9527567 · 2024-07-07T08:14:02Z

Perhaps I should reopen an issue and describe it here.

egallicc · 2024-07-08T12:46:08Z

I have never used pymbar. However, here are some comments/suggestions:

MBAR requires the bias energy to be expressed in kT units. For example:

def bias_fcn(epert, beta, lam1, lam2, alpha, u0, w0):
    # ALPHA, U0, W0COEFF
    # Initialize ebias1 with zeros
    ebias1 = np.zeros_like(epert)
    
    # Check if alpha is greater than 0
    if alpha > 0:
        ee = 1 + np.exp(-alpha*(epert - u0))
        ebias1 = (lam2 - lam1) * np.log(ee) / alpha
    
    return beta * ( ebias1 + lam2 * epert + w0 )

where beta = 1/(kB T). AToM's .out files express energies in kcal/mol units, so for T=300 K, beta = 1./(1.986e-3*300.0) for all samples and states.

Multi-state free energy inference is based on calculating the bias energy of all samples at all states. I think your code does it only for the state in which the sample was collected. If the perturbation energy samples are collected in usamples, something like this would calculate the bias matrix u_kn required by pymbar.MBAR():

nstates = len(lambda1)
u_kn = np.zeros((nstates, len(usamples)))
for k in range(nstates):
      u_kn[k] = bias_fcn(usamples, bet[k], lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])

The second argument of pymbar.MBAR(), N_k is the number of samples collected at each state k. See below.

The free energy analysis is performed for the two ATM legs separately and the binding free energy is the difference of the corresponding free energies. In your case, leg1's states correspond to DIRECTION > 0 (the first 11 states) and leg2's to DIRECTION < 0 (the second 11 states). So, for example, this will give you the lambda1's values for leg1:

lambda1 = np.array(config['LAMBDA1'], dtype=np.float64)[0:11]

And something like this would collect only leg1's samples from the .out files:

def read_u_energy(name: str, n_replicas: int = 22):
    usamples = []
    n_k = [ 0 for i in range(    int(n_replicas/2)  ) ]
    for k in range(n_replicas):
        lines = Path(f'r{k}/{name}.out').read_text().split('\n')
        for line in lines:
            values = line.split()
            if len(values) > 0:
                stateid = int( values[0] )
                direction = float( values[2] )
                if direction > 0:
                    usamples.append( float( values[9] ) )
                    n_k[ stateid ] += 1
    return ( np.array(usamples,dtype=np.float64), np.array(n_k,np.int32) )

Call it as:

(usamples, N_k) = read_u_energy(args.name, 22)

The DIRECTION value is stored in the third column. Only samples with DIRECTION > 0 are loaded. The function returns the perturbation energy samples and the number of samples collected at each state. Note that all replicas' .out files are scanned because leg1's samples can be found in any of the 22 replicas, even those started in leg2's states. Also, for production, consider discarding the first portion of the data for equilibration. For example, do for line in lines[20:] above to discard the first 20 samples of each replica.

With the above, I think this would calculate the free energy difference for leg1:

mbar = pymbar.MBAR(u_kn = u_kn, N_k=N_k)
print( mbar.compute_free_energy_differences() )

9527567 · 2024-07-15T03:13:19Z

Thank you, I have modified the analysis code.

import numpy as np
import argparse
import sys
from configobj import ConfigObj
from pathlib import Path
import pymbar
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument('-n', '--name', required=True, help='name')
parser.add_argument('-c', '--config', required=True, help='config file')
parser.add_argument('-s', '--start', required=False,
                    help='start sample', default=0)
parser.add_argument('-e', '--end', required=False,
                    help='end sample', default=sys.maxsize)

args = parser.parse_args()


def bias_fcn(epert, beta, lam1, lam2, alpha, u0, w0):
    # ALPHA, U0, W0COEFF
    # Initialize ebias1 with zeros
    ebias1 = np.zeros_like(epert)

    # Check if alpha is greater than 0
    if alpha > 0:
        ee = 1 + np.exp(-alpha*(epert - u0))
        ebias1 = (lam2 - lam1) * np.log(ee) / alpha

    return beta * (ebias1 + lam2 * epert + w0)


def read_u_energy(name: str, n_replicas: int = 22, start: int = 0, end: int = sys.maxsize):
    lig1_usamples = []
    lig2_usamples = []
    n_k1 = [0 for i in range(int(n_replicas/2))]
    n_k2 = [0 for i in range(int(n_replicas/2))]

    for k in range(n_replicas):
        lines = Path(f'r{k}/{name}.out').read_text().split('\n')
        for line in lines[start: end]:
            values = line.split()
            if len(values) > 0:
                stateid = int(values[0])
                direction = float(values[2])
                if direction > 0:
                    lig1_usamples.append(float(values[9]))
                    n_k1[stateid] += 1
                else:
                    lig2_usamples.append(float(values[9]))
                    n_k2[stateid-11] += 1
    return (np.array(lig1_usamples, dtype=np.float64), np.array(lig2_usamples, dtype=np.float64), np.array(n_k1, np.int32), np.array(n_k2, np.int32))


def read_config(config_file: str):
    config = ConfigObj(config_file)
    return config


if __name__ == "__main__":
    config = read_config(args.config)

    lambda1 = np.array(config['LAMBDA1'], dtype=np.float64)
    lambda2 = np.array(config['LAMBDA2'], dtype=np.float64)
    lambdas = np.array(config['LAMBDAS'], dtype=np.float64)

    alpha = np.array(config['ALPHA'], dtype=np.float64)
    u0 = np.array(config['U0'], dtype=np.float64)
    w0 = np.array(config['W0COEFF'], dtype=np.float64)
    (lig1_usamples, lig2_usamples, N_k1, N_k2) = read_u_energy(
        args.name, 22, int(args.start), int(args.end))
    nstates = 11
    u_kn1 = np.zeros((nstates, len(lig1_usamples)))
    u_kn2 = np.zeros((nstates, len(lig2_usamples)))

    for k in range(nstates):
        u_kn1[k] = bias_fcn(lig1_usamples, 1./(1.986e-3*300.0),
                            lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])
    for k in range(11, 22):
        u_kn2[k-11] = bias_fcn(lig2_usamples, 1./(1.986e-3*300.0),
                               lambda1[k], lambda2[k], alpha[k], u0[k], w0[k])
    u_nk1 = pd.DataFrame(u_kn1)
    u_nk2 = pd.DataFrame(u_kn2)

    mbar1 = pymbar.MBAR(u_kn=u_kn1, N_k=N_k1, verbose=True,
                        solver_protocol='robust')
    mbar2 = pymbar.MBAR(u_kn=u_kn2, N_k=N_k2, verbose=True,
                        solver_protocol='robust')

    a = mbar1.compute_free_energy_differences()
    b = mbar2.compute_free_energy_differences()
    print(a['Delta_f'])
    print(b['Delta_f'])

Furthermore, can we use alchemlyb for analysis? But I don't know how to create its input as follows.

egallicc · 2024-07-17T17:54:24Z

Thank you.

Sorry, I am not familiar with alchemlyb's input. Presumably, it would be related to the output of bias_fcn() as for MBAR.

9527567 mentioned this issue Sep 18, 2024

How to create input without using a parser? alchemistry/alchemlyb#400

Closed

egallicc closed this as completed Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem with pymbar analysis. #87

The problem with pymbar analysis. #87

9527567 commented Jul 7, 2024 •

edited

Loading

egallicc commented Jul 8, 2024 •

edited

Loading

9527567 commented Jul 15, 2024

egallicc commented Jul 17, 2024

The problem with pymbar analysis. #87

The problem with pymbar analysis. #87

Comments

9527567 commented Jul 7, 2024 • edited Loading

egallicc commented Jul 8, 2024 • edited Loading

9527567 commented Jul 15, 2024

egallicc commented Jul 17, 2024

9527567 commented Jul 7, 2024 •

edited

Loading

egallicc commented Jul 8, 2024 •

edited

Loading