GCDM-SBDD

Description

This is the official structure-based drug design (SBDD) codebase of the paper

Geometry-Complete Diffusion for 3D Molecule Generation and Optimization, Nature CommsChem

System requirements

OS requirements

This package supports Linux. The package has been tested on the following Linux system: Description: AlmaLinux release 8.9 (Midnight Oncilla)

Python dependencies

This package is developed and tested under Python 3.10.x. The primary Python packages and their versions are as follows. For more details, please refer to the environment.yaml file.

hydra-core=1.3.2
matplotlib-base=3.7.1
numpy=1.24.3
pyg=2.3.0=py310_torch_2.0.0_cu118
python=3.10.11
pytorch=2.0.1=py3.10_cuda11.8_cudnn8.7.0_0
pytorch-scatter=2.1.1=py310_torch_2.0.0_cu118
pytorch-lightning=2.0.2
scikit-learn=1.2.2
torchmetrics=0.11.4

Installation guide

Install mamba (~500 MB: ~1 minute)

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh  # accept all terms and install to the default location
rm Mambaforge-$(uname)-$(uname -m).sh  # (optionally) remove installer after using it
source ~/.bashrc  # alternatively, one can restart their shell session to achieve the same result

Install dependencies (~15 GB: ~10 minutes)

# clone project
git clone https://github.com/BioinfoMachineLearning/GCDM-SBDD
cd GCDM-SBDD

# create conda environment
mamba env create -f environment.yaml
conda activate GCDM-SBDD  # note: one still needs to use `conda` to (de)activate environments

# install local project as package
pip3 install -e .

Download checkpoints (~500 MB extracted: ~2 minutes)

Note: Make sure to be located in the project's root directory beforehand (e.g., ~/GCDM-SBDD/)

# fetch and extract model checkpoints directory
wget https://zenodo.org/record/13375913/files/GCDM_SBDD_Checkpoints.tar.gz
tar -xzf GCDM_SBDD_Checkpoints.tar.gz
rm GCDM_SBDD_Checkpoints.tar.gz

NOTE: Trained EGNN baseline checkpoint files are also included in GCDM_SBDD_Checkpoints.tar.gz.

QuickVina 2

For docking, download QuickVina 2 and copy it to your Conda environment's binary (bin) directory:

wget https://github.com/QVina/qvina/raw/master/bin/qvina2.1
chmod +x qvina2.1
mv qvina2.1 $HOME/mambaforge/envs/GCDM-SBDD/bin

We need MGLTools for preparing the receptor for docking (pdb -> pdbqt) but it can mess up your Mamba environment, so I recommend making a new one:

mamba create -n mgltools -c bioconda mgltools

Binding MOAD

Data preparation

Download the dataset

wget https://zenodo.org/record/13375913/files/every_part_a.zip
wget https://zenodo.org/record/13375913/files/every_part_b.zip
wget https://zenodo.org/record/13375913/files/every.csv

unzip every_part_a.zip
unzip every_part_b.zip

Process the raw data using

python process_bindingmoad.py <bindingmoad_dir>

or, to suppress warnings,

python -W ignore process_bindingmoad.py <bindingmoad_dir>

CrossDocked Benchmark

Data preparation

Download and extract the dataset as described by the authors of Pocket2Mol: https://github.com/pengxingang/Pocket2Mol/tree/main/data

Process the raw data using

python process_crossdock.py <crossdocked_dir> --no_H

Finetuning Set

Data preparation

Download your desired finetuning dataset of protein-ligand complexes, each stored in a single PDB file (e.g., 1a4z_LIG:A:403.pdb), to data/finetuning_set, and then run

python process_finetuning_set.py <finetuning_set_dir>

or, to suppress warnings,

python -W ignore process_finetuning_set.py <finetuning_set_dir>

Tutorials

We provide a two-part tutorial series of Jupyter notebooks to provide users with a real-world example of how to use GCDM-SBDD for pocket-based molecule generation and filtering, as outlined below.

Demo

Sample molecules for a given pocket

To sample small molecules for a given pocket with a trained model use the following command:

python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outdir <output_dir> --resi_list <list_of_pocket_residue_ids>

For example:

python generate_ligands.py last.ckpt --pdbfile 1abc.pdb --outdir results/ --resi_list A:1 A:2 A:3 A:4 A:5 A:6 A:7

Alternatively, the binding pocket can also be specified based on a reference ligand in the same PDB file:

python generate_ligands.py <checkpoint>.ckpt --pdbfile <pdb_file>.pdb --outdir <output_dir> --ref_ligand <chain>:<resi>

Optional flags:

Flag	Description
`--n_samples`	Number of sampled molecules
`--all_frags`	Keep all disconnected fragments
`--sanitize`	Sanitize molecules (invalid molecules will be removed if this flag is present)
`--relax`	Relax generated structure in force field
`--resamplings`	Inpainting parameter (doesn't apply if conditional model is used)
`--jump_length`	Inpainting parameter (doesn't apply if conditional model is used)

Instructions for use

Training

Starting a new training run:

python -u train.py config=<config>.yml

Resuming a previous run:

python -u train.py config=<config>.yml resume=<checkpoint>.ckpt

Reproduce paper results

test.py can be used to sample molecules for the entire testing set:

python test.py <checkpoint>.ckpt --test_dir <bindingmoad_dir>/processed_noH/test/ --outdir <output_dir> --fix_n_nodes

Using the optional --fix_n_nodes flag lets the model produce ligands with the same number of nodes as the original molecule. Other optional flags are identical to generate_ligands.py.

Compute sample metrics

For assessing basic molecular properties create an instance of the MoleculeProperties class and run its evaluate method:

from analysis.metrics import MoleculeProperties
mol_metrics = MoleculeProperties()
all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = \
    mol_metrics.evaluate(pocket_mols)

evaluate() expects a list of lists where the inner list contains all RDKit molecules generated for one pocket.

For computing docking scores, run QuickVina as described below.

Run QuickVina2

First, convert all protein PDB files to PDBQT files using MGLTools

conda activate mgltools
cd analysis
python2 docking_py27.py <bindingmoad_dir>/processed_noH/test/ <output_dir> bindingmoad
cd ..
conda deactivate

Then, compute QuickVina scores:

conda activate GCDM-SBDD
python3 analysis/docking.py --pdbqt_dir <docking_py27_outdir> --sdf_dir <test_outdir> --out_dir <qvina_outdir> --write_csv --write_dict --dataset moad

NOTE: One can reference analysis/inference_analysis.py and analysis/molecule_analysis.py to analyze the generated molecules.

Docker

To build this project in a Docker container, you can use the following commands:

## Build the image
docker build -t gcdm-sbdd .

## Run the container (with GPUs and mounting the current directory)
docker run -it --gpus all -v .:/mnt --name gcdm-sbdd gcdm-sbdd

This Docker image is also available on Docker Hub at cford38/gcdm-sbdd, which can be run with the following command:

# docker pull cford38/gcdm-sbdd

docker run -it --gpus all -v .:/mnt --name gcdm-sbdd cford38/gcdm-sbdd

(Note: This image includes the checkpoints in the main working directory /software/GCDM-SBDD/checkpoints/.)

Acknowledgements

GCDM-SBDD builds upon the source code and data from the following projects:

We thank all their contributors and maintainers!

License

This project is covered under the MIT License.

Citation

If you use the code or data associated with this package or otherwise find this work useful, please cite:

@article{morehead2024geometry,
  title={Geometry-complete diffusion for 3D molecule generation and optimization},
  author={Morehead, Alex and Cheng, Jianlin},
  journal={Communications Chemistry},
  volume={7},
  number={1},
  pages={150},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
analysis		analysis
configs		configs
data		data
equivariant_diffusion		equivariant_diffusion
img		img
notebooks		notebooks
scripts/nautilus		scripts/nautilus
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
citation.bib		citation.bib
constants.py		constants.py
dataset.py		dataset.py
environment.yaml		environment.yaml
eval_ligands.py		eval_ligands.py
generate_ligands.py		generate_ligands.py
geometry_utils.py		geometry_utils.py
lightning_modules.py		lightning_modules.py
process_bindingmoad.py		process_bindingmoad.py
process_crossdock.py		process_crossdock.py
process_finetuning_set.py		process_finetuning_set.py
setup.py		setup.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCDM-SBDD

Description

Contents

System requirements

OS requirements

Python dependencies

Installation guide

QuickVina 2

Binding MOAD

Data preparation

CrossDocked Benchmark

Data preparation

Finetuning Set

Data preparation

Tutorials

Demo

Sample molecules for a given pocket

Instructions for use

Training

Reproduce paper results

Compute sample metrics

Run QuickVina2

Docker

Acknowledgements

License

Citation

About

Releases 2

Packages

Contributors 2

Languages

License

BioinfoMachineLearning/GCDM-SBDD

Folders and files

Latest commit

History

Repository files navigation

GCDM-SBDD

Description

Contents

System requirements

OS requirements

Python dependencies

Installation guide

QuickVina 2

Binding MOAD

Data preparation

CrossDocked Benchmark

Data preparation

Finetuning Set

Data preparation

Tutorials

Demo

Sample molecules for a given pocket

Instructions for use

Training

Reproduce paper results

Compute sample metrics

Run QuickVina2

Docker

Acknowledgements

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages