The code has been tested in the following environment:
conda create -n tagmol python=3.8.17
conda activate tagmol
conda install pytorch=1.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg=2.2.0 -c pyg
conda install rdkit=2022.03.2 openbabel=3.1.1 tensorboard=2.13.0 pyyaml=6.0 easydict=1.9 python-lmdb=1.4.1 -c conda-forge
# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
IMPORTANT NOTE: You might have to do the following to append the path of the root working directory.
export PYTHONPATH=".":$PYTHONPATH
The resources can be found here. The data are inside data
directory, the backbone model is inside pretrained_models
and the guide checkpoints are inside logs
.
python scripts/train_diffusion.py configs/training.yml
python scripts/train_dock_guide.py configs/training_dock_guide.yml
python scripts/train_dock_guide.py configs/training_dock_guide_qed.yml
python scripts/train_dock_guide.py configs/training_dock_guide_sa.yml
NOTE: The outputs are saved in logs/
by default.
python scripts/sample_diffusion.py configs/sampling.yml --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.
We have a bash file that can run the inference for the entire test set in a loop.
bash scripts/batch_sample_diffusion.sh configs/sampling.yml backbone
The output will be stored in experiments/backbone
.
The following variables: BATCH_SIZE
, NODE_ALL
, NODE_THIS
and START_IDX
, can be modified in the script file, if required.
python scripts/sample_multi_guided_diffusion.py [path-to-config.yml] --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.
To run inference on all 100 targets in the test set:
bash scripts/batch_sample_multi_guided_diffusion.sh [path-to-config.yml] [output-dir-name]
The outputs are stored in experiments_multi/[output-dir-name]
when run using the bash file. The config files are available in configs/noise_guide_multi
.
- Single-objective guidance
- BA:
sampling_guided_ba_1.yml
- QED:
sampling_guided_qed_1.yml
- SA:
sampling_guided_sa_1.yml
- BA:
- Dual-objective guidance
- QED + BA:
sampling_guided_qed_0.5_ba_0.5.yml
- SA + BA:
sampling_guided_sa_0.5_ba_0.5.yml
- QED + SA:
sampling_guided_qed_0.5_sa_0.5.yml
- QED + BA:
- Multi-objective guidance (our main model)
- QED + SA + BA:
sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml
- QED + SA + BA:
For example, to run the multi-objective setting (i.e., our model):
bash scripts/batch_sample_multi_guided_diffusion.sh configs/noise_guide_multi/sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml qed_0.33_sa_0.33_ba_0.34
python scripts/eval_dock_guide.py --ckpt_path [path-to-checkpoint.pt]
python scripts/evaluate_diffusion.py {OUTPUT_DIR} --docking_mode vina_score --protein_root data/test_set
The docking mode can be chosen from {qvina, vina_score, vina_dock, none}
NOTE: It will take some time to prepare pqdqt and pqr files when you run the evaluation code with vina_score/vina_dock docking mode for the first time.
Methods | Vina Score (↓) | Vina Min (↓) | Vina Dock (↓) | High Affinity (↑) | QED (↑) | SA (↑) | Diversity (↑) | Hit Rate % (↑) | |||||||
Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | ||
Reference | -6.36 | -6.46 | -6.71 | -6.49 | -7.45 | -7.26 | - | - | 0.48 | 0.47 | 0.73 | 0.74 | - | - | 21 |
liGAN | - | - | - | - | -6.33 | -6.20 | 21.1% | 11.1% | 0.39 | 0.39 | 0.59 | 0.57 | 0.66 | 0.67 | 13.2 |
AR | -5.75 | -5.64 | -6.18 | -5.88 | -6.75 | -6.62 | 37.9% | 31.0% | 0.51 | 0.50 | 0.63 | 0.63 | 0.70 | 0.70 | 12.9 |
Pocket2Mol | -5.14 | -4.70 | -6.42 | -5.82 | -7.15 | -6.79 | 48.4% | 51.0% | 0.56 | 0.57 | 0.74 | 0.75 | 0.69 | 0.71 | 24.3 |
TargetDiff | -5.47 | -6.30 | -6.64 | -6.83 | -7.80 | -7.91 | 58.1% | 59.1% | 0.48 | 0.48 | 0.58 | 0.58 | 0.72 | 0.71 | 20.5 |
DecompDiff | -4.85 | -6.03 | -6.76 | -7.09 | -8.48 | -8.50 | 64.8% | 78.6% | 0.44 | 0.41 | 0.59 | 0.59 | 0.63 | 0.62 | 24.9 |
TAGMol | -7.02 | -7.77 | -7.95 | -8.07 | -8.59 | -8.69 | 69.8% | 76.4% | 0.55 | 0.56 | 0.56 | 0.56 | 0.69 | 0.70 | 27.7 |
Due to space constraints, we only share the eval_results
folder generated from the evaluation script. It can be found in the same link as other resources, inside results
directory.
@article{dorna2024tagmol,
title={TAGMol: Target-Aware Gradient-guided Molecule Generation},
author={Vineeth Dorna and D. Subhalingam and Keshav Kolluru and Shreshth Tuli and Mrityunjay Singh and Saurabh Singal and N. M. Anoop Krishnan and Sayan Ranu},
journal={arXiv preprint arXiv:2406.01650},
year={2024}
}
This codebase was build on top of TargetDiff