The BioFTS package is Python package for field theoretic simulations (FTS) of polymer field theories through Complex-Langevin sampling written by Jonas Wessén. The code is applicable to a large class of polymer field theories but a particular goal of the package is simulations of liquid-liquid phase-separation of intrinsically disordered proteins (IDPs) and other processes relevant to biomolecular condensates.
If you are new to polymer field theories and FTS, the following references will provide a good starting point:
- G. H. Fredrickson, V. Ganesan and F. Drolet, Field-Theoretic Computer Simulation Methods for Polymers and Complex Fluids, Macromolecules 2002, 35, 1, 16–39. https://doi.org/10.1021/ma011515t
- V. Ganesan and G. H. Fredrickson, Field-theoretic polymer simulations, 2001 EPL 55 814. https://doi.org/10.1209/epl/i2001-00353-8
- M. W. Matsen (2005). Self-Consistent Field Theory and Its Applications. In Soft Matter (eds G. Gompper and M. Schick). https://doi.org/10.1002/9783527617050.ch2
For the serious reader, the following book beautifully covers the topic:
- G. H. Fredrickson and K. T. Delaney. (2023). Field-Theoretic Simulations in Soft Matter and Quantum Fluids. Oxford University Press.
For an application of FTS to IDPs with both long-range electrostatic interactions and short-range residue-specific interactions, see:
- J. Wessén, S. Das., T. Pal, H. S. Chan. Analytical Formulation and Field-Theoretic Simulation of Sequence-Specific Phase Separation of Protein-Like Heteropolymers with Short- and Long-Spatial-Range Interactions, J. Phys. Chem. B 2022, 126, 9222−9245, https://doi.org/10.1021/acs.jpcb.2c06181
BioFTS can be used to study any polymer field theory described by a Hamiltonian on the form
Here,
In the corresponding explicit particle representation of the system, the non-bonded interaction energy is the sum over all monomer pairs, denoted
[If this formalism is unfamiliar to you, please have a look at the references above.]
The key functionality of BioFTS is to evolve the fields
where
It is possible in BioFTS to set the noise term
First, import the package along with any other packages you need:
import biofts
import numpy as np
Setting up a simulation in BioFTS is done through the following steps:
In FTS, each field results from a specific interaction type in the system, so the first step in setting up a BioFTS code is to define which interactions we want to include. BioFTS currently supports Yukawa-type interactions,
# Excluded volume contact interactions
v = 0.0068
excluded_volume = biofts.Contact(1./v)
# (Un-screened) Electrostatics
lB = 2. # Bjerrum length, lB = e^2 / (4 pi epsilon k_B T) in model units
kappa = 0.0 # Inverse Debye screening length
electrostatics = biofts.Yukawa(lB,kappa)
# Collect all interactions in a tuple. This will be used in step 2 to create the simulation box.
interactions = (excluded_volume,electrostatics)
You can define any number of interactions in this way.
A simulation box is created by specifying the number of lattice sites in each dimension, the side-lengths of the simulation box and the interactions in the system. For example,
# Define the grid
grid_dimensions = [16,16,80] # Number of lattice sites in each dimension. This can be a 1D, 2D, or 3D grid.
side_lengths = [8.,8.,40.] # Length of the simulation box in each dimension
# Create the simulation box
sb = biofts.SimulationBox(grid_dimensions,side_lengths,interactions)
If you have CUDA Python installed, you can use pass the argument use_GPU=True
to the SimulationBox
constructor to use the GPU to speed up the Complex-Langevin sampling.
Currently, BioFTS only supports linear bead-spring polymers where each monomer is associated with a set of generalized charges that governs its interactions with other monomers through the interactions defined in Step 1. For applications to IDPs, each monomer typically represents a residue in the protein sequence.
The single-molecule partition function for such a polymer species with
where
Here,
The following code snippet shows how to add a single polymer species, corresponding to a linear chain of E (glutamic acid) and K (lysine) residues:
aa_sequence = 'EKKKKKKEEKKKEEEEEKKKEEEKKKEKKEEKEKEEKEKKEKKEEKEEEE' # Amino-acid sequence
mol_id = 'sv10' # Name of the polymer species
N = len(aa_sequence) # Number of monomers in the chain
q = np.zeros( (2,N) ) # Generalized charges. The first index refers to interaction type, the second to monomer index.
# Excluded volume interactions
q[0,:] += 1. # The generalized charge for the excluded volume interaction is the monomer size, set to 1 for all monomers.
# Electrostatic interactions
aa_charges = {'E':-1,'K':1} # Electric charges for each amino acid type
q[1,:] = [ aa_charges[aa] for aa in aa_sequence ]
# Chain density
rho_bulk = 2.0 / N # rho_bulk is chain number density, n/V. Bead number density is n*N/V.
# Create the polymer species
a = 1./np.sqrt(6.) # Gaussian smearing length
b = 1. # Kuhn length
biofts.LinearPolymer(q,a,b,rho_bulk,sb,molecule_id=seq_label)
You can add any number of polymer species to the simulation box in this way. Note that LinearPolymer
object can be used to represent explicit salt ions and solvent particles.
For species in the grand canonical ensemble, you can supply the argument is_canonical=False
to the LinearPolymer
constructor. In this case, the rho_bulk
argument is interpreted as the activity of the species (i.e. the exponential of the chemical potential).
To run the simulation, we first need to create a ComplexLangevin
object which takes as input the time step dt
and the simulation box sb
. For example:
# Set-up the CL integrator
dt = 1e-3
cl = biofts.ComplexLangevinIntegrator(dt, sb)
For self-consistent field theory (SCFT), you can set the noise to zero by passing the additional argument noise=0
to the ComplexLangevinIntegrator
constructor.
The cl
object has a method run_ComplexLangevin(n_steps)
which evolves the fields in the simulation box for n_steps
time steps using a semi-implicit integration scheme. This method has two optional arguments sample_interval
and sampling_tasks
. The sampling_tasks
argument is a tuple where every elements represents an action to be taken at every sample_interval
steps. This can be used to e.g. compute and store observables.
BioFTS provides a number of built-in sampling tasks that can be used for this purpose. For example, the Monitor_Density_Profiles_Averaged_to_1d
creates a matplotlib
figure that visualizes the average monomer densities (for all molecule species in the system) along the last axis of the simulation box in real time. The figure is updated every sample_interval
steps. The following code snippet shows how to set up this task:
# Set-up visualization task
visualizer = biofts.Monitor_Density_Profiles_Averaged_to_1d(sb)
# Run the simulation
n_steps = 10000
sample_interval = 50
cl.run_ComplexLangevin(n_steps, sample_interval=sample_interval, sampling_tasks=(visualizer,))
This will show a figure that looks like this:
The top panel shows the density profile, i.e. the average monomer density along the last spatial coordinate
BioFTS currently provides the following built-in sampling tasks which can be created for a given simulation_box
object:
Monitor_Density_Profiles_Averaged_to_1d(simulation_box, species_to_plot=None, show_imaginary_part=False, pause_at_end=True)
: Monitors the average monomer densities along the last axis of the simulation box in real time. The optional argumentspecies_to_plot
is a list of integers representing the molecular species to be shown, e.g.species_to_plot=[0,1,3]
means that the first, second and fourth species that were added tosimulation_box
will be show (ifNone
, all species are plotted). Ifshow_imaginary_part=True
, the imaginary part of the density operator is also plotted as dashed curves. Ifpause_at_end=True
, the simulation pauses at the end of the run until the user closes the figure window.Save_Latest_Density_Profiles(simulation_box, data_directory='')
: Saves the density operators to a filedata_directory + 'density_profiles.npz
. The file is overwritten every time the task is called. You can load the densities asrho = np.load(data_directory + 'density_profiles.npz')
whererho[i]
will be the density operator for thei
th species.Save_1d_Density_Profiles(simulation_box, data_directory='', species_to_plot=None, remove_old_data_files=False)
: This creates one file per species,data_directory + 'density_profile_(mol id).txt'
, (wheremol id
is the name of the species). At every sampling point, a new line will be appended to the files where the first column is the current Complex-Langevin time, the second column is the step number, the third column is the real part of the chemical potential, and the remaining columns are the real part of the density operator averaged along the last axis of the simulation box. Ifspecies_to_plot
is notNone
, only the species with indices in integer listspecies_to_plot
will be saved. Ifremove_old_data_files=True
, the old data files will be removed before the new data is saved.Save_Field_Configuration(simulation_box, data_directory='', load_last_configuration=True)
: Saves the current field configuration to a filedata_directory + 'field_configuration.npy'
. Ifload_last_configuration=True
, the task will load the last saved field configuration at the beginning of the simulation. This can be useful for restarting a simulation from a previous state.
You will likely want to create your own sampling tasks to analyze the data and store the results relevant to your specific model and research question. To do this, you can use the following template class:
class MySamplingTask:
def __init__(self, simulation_box, ...):
self.sb = simulation_box
# Initialize any variables you need here
def sample(self, sample_index):
# This method is called every time the task is sampled
# Compute and store any observables you need here
pass
def finalize(self):
# This method is called at the end of the simulation
pass
At each sampling point, you can retrieve the current field configuration from the simulation_box
object as self.sb.Psi
which is an (n_fields, *grid_dimensions)
array. You can also retrieve the current number density operators as self.sb.species[i].rhob
and the generalized charge-density operators as self.sb.species[i].rho
where i
is the index of the species in the simulation box. The self.sb.species[i].rho
is an (n_fields, *grid_dimensions)
array.
To use your sampling task, simply add it to the tuple of sampling tasks in the sampling_tasks
argument of the run_ComplexLangevin
method as shown above.