GitHub - Awallace3/NeuralPLexer: NeuralPLexer: State-specific protein-ligand complex structure prediction with a multi-scale deep generative model

Organization

The work on this fork primarily focuses on using NeuralPLexer outputted structures to predict binding affinities using the PDBBind dataset with relevant files located in ./sf/. Additionally, to generate the ligand conformers needed for the model, please use this fork of the torsional-diffusion repo https://github.com/shehan807/torsional-diffusion.

./sf/pl.py is used to generate PL complex conformers
./sf/protein.py is used to generate P conformers
./sf/torsional_diffusion_smiles_csv.py is used to generate L conformers. Note that you must clone and install a forked version of the torsional-diffusion code and install the pre-trained model. https://github.com/Awallace3/torsional-diffusion
./sf/data_loader.py creates training and validation data into a dataloader format
./sf/train_affinety.py will train models. To train specific models, different arguments into AffiNETy() are required. Some examples to re-create models in the report are in ./sf/train_affinety_boltz_avg.py , ./sf/train_affinety_boltz_avg_Q.py, and ./sf/train_affinety_boltz_mlp.py
./sf/src/models.py contains the definitions for all models investigated for the report.

Installation - (Group Members)

Clone my fork

Create env

Install dev version

4. copy my zip file for pre-trained models from my share directory 6. Run first example

git clone git@github.com:Awallace3/NeuralPLexer.git && cd NeuralPLexer
conda env create -f npl.yml
pip install -e .
cp /storage/hive/project/chem-sherrill/awallace43/share/neuralplexermodels_downstream_datasets_predictions.zip .
unzip neuralplexermodels_downstream_datasets_predictions.zip
cd example/1Y0L_NPLexer && bash run.sh

NeuralPLexer

Official implementation of NeuralPLexer, a deep generative model to jointly predict protein-ligand complex 3D structures and beyond.

Reference

Qiao Z, Nie W, Vahdat A, Miller III TF, Anandkumar A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Nature Machine Intelligence, 2024. https://doi.org/10.1038/s42256-024-00792-z.

Pretrained model checkpoints described in the published manuscript, downstream evaluation datasets, and predicted structures are available at the following Zenodo repository for non-commercial usage under the CC BY-NC-SA 4.0 license: https://doi.org/10.5281/zenodo.10373581.

Installation

A GPU machine with CUDA>=10.2 support is required to run the model. For a Linux environment, the following commands can be used to install the package:

make environment
make install

Model inference for new protein-ligand pairs

Example usage for the base model with a template structure in pdb format:

neuralplexer-inference --task=batched_structure_sampling \
                       --input-receptor input.pdb \
                       --input-ligand <ligand>.sdf \
                       --use-template  --input-template <template>.pdb \
                       --out-path <output_path> \
                       --model-checkpoint <data_dir>/models/complex_structure_prediction.ckpt \
                       --n-samples 16 \
                       --chunk-size 4 \
                       --num-steps=40 \
                       --cuda \
                       --sampler=langevin_simulated_annealing

NeuralPLexer CLI supports the prediction of biological complexes without ligands, with a single ligand, with multiple ligands (e.g. substrate-cofactor systems), and/or with receptors of single or multiple protein chains. Common input options are:

input-receptor and input-ligand are the input protein and ligand structures;
- input-receptor can be either a PDB file or protein sequences. In case the input is a multi-chain protein in the primary sequence format, the chains should be separated by a | sign; in case the input is a PDB file, no coordinate information from the file is used for generation unless the file itself is separately provided as a template structure via input-template.
- input-ligand can be either sdf files or SMILES strings. In case the input is a multi-ligand complex, the ligands should be separated by a | sign;
use-template and input-template are the options to use a template structure for the input protein;
out-path is the output directory to store the predicted structures;
model-checkpoint is the path to the trained model checkpoint;
n-samples is the number of conformations to generate in total;
chunk-size is the number of conformation to generate in parallel;
num-steps is the number of steps for the diffusion part of the sampling process;
separate-pdb determines whether to output the predicted protein structures into dedicated PDB files;
rank-outputs-by-confidence determines whether to rank-order the predicted ligand (and potentially protein) output files, where outputs are ranked using the predicted ligand confidence if available and using the predicted protein confidence otherwise;

Expected outputs under <output_path>:

prot_all.pdb and lig_all.sdf contains the output geometries of all n_samples predicted conformations of the biological assembly;
- prot_0.pdb, prot_1.pdb, ... stores the individual frames of the predicted protein conformations;
- lig_0.sdf, lig_1.sdf, ... stores the individual frames of the predicted ligand conformations.

In benchmark_tiny.sh we also provided minimal example commands for running complex generation over many distinct input sets using data provided in in the Zenodo repo, analogous to the process used to obtain the benchmarking results but with reduced number of samples, denoising steps, and template choices.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
af_common		af_common
baselines		baselines
docs		docs
example/1Y0L_NPLexer		example/1Y0L_NPLexer
neuralplexer		neuralplexer
notebooks		notebooks
sf		sf
tests		tests
training_index		training_index
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
Dockerfile		Dockerfile
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NeuralPLexer-requirements.txt		NeuralPLexer-requirements.txt
NeuralPLexer.yml		NeuralPLexer.yml
README.rst		README.rst
benchmark_tiny.sh		benchmark_tiny.sh
environment_dev.yaml		environment_dev.yaml
ice_npl.yml		ice_npl.yml
npl.yml		npl.yml
of.txt		of.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
sf.yml		sf.yml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Organization

Installation - (Group Members)

NeuralPLexer

Reference

Installation

Model inference for new protein-ligand pairs

Credits

About

Releases

Packages

Languages

License

Awallace3/NeuralPLexer

Folders and files

Latest commit

History

Repository files navigation

Organization

Installation - (Group Members)

NeuralPLexer

Reference

Installation

Model inference for new protein-ligand pairs

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages