Skip to content
/ COMA Public

COMA: Efficient Structure-constrained Molecular Generation using Contractive and Margin losses

License

Notifications You must be signed in to change notification settings

mathcom/COMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMA: efficient structure-constrained molecular generation using COnstractive and MArgin losses

  • Latest update: 31 Oct 2023

thumbnail

This repository is for COMA, a structure-constrained molecular generative model.

For a given source molecule, COMA generates a novel molecule with more improved chemical properties by making a small modification on the source structure.

To achieve property improvement and high structural similarity simultaneously, COMA exploits reinforcement learning and metric learning.

For more detail, please refer to J. Choi, S. Seo, and S. Park. COMA: efficient structure-constrained molecular generation using contractive and margin losses. J Cheminform 15, 8 (2023). https://doi.org/10.1186/s13321-023-00679-y


SYSTEM REQUIERMENTS:

  • (If GPU is available) COMA may require GPU memory larger than 6GB.
    • Available cudatoolkit versions: 10.2, 11.1, and 11.3

Installation:

  • We recommend to install via Anaconda (https://www.anaconda.com/)

  • After installing Anaconda, please create a conda environment with the following commands:

git clone https://github.com/mathcom/COMA.git
cd COMA
conda env create -f environment.yml
  • For the reproducibility of our paper, you should use the legacy env file.
    • DRD2 oracle is functional only in Python 3.7
    • Except for DRD2, other properties are available in the latest Python and PyTorch
conda env create -f environment_legacy.yml

Data:

  • Before running tutorials, an user should decompress the compressed files: data/{name}.tar.gz

  • The following commands are for decompression:

cd data
tar -xzvf drd2.tar.gz
tar -xzvf qed.tar.gz
tar -xzvf logp04.tar.gz
tar -xzvf logp06.tar.gz
cd ..
  • After decompressing, an user can find the following files and is ready to run the provided scripts.

    • rdkit_test.txt
    • rdkit_train_pairs.txt
    • rdkit_train_src.txt
    • rdkit_train_tar.txt
    • rdkit_train_triplet.txt
    • rdkit_valid.txt
  • The details of how to create a triplet dataset are described in the Algorithm S3 of our paper.


Scripts:

  • We provide several jupyter-notebooks to explain how to use COMA.

    • 1_pretraining.ipynb
    • 2_finetuning.ipynb
    • 3_latent_space_analysis.ipynb
    • 4_generation.ipynb
    • 5_evaluation.ipynb
    • 6_drawing_molecules.ipynb
  • An user can open them using the following commands:

conda activate coma
jupyter notebook

~ run tutorial ~

conda deactivate

Source codes:


Contact:


About

COMA: Efficient Structure-constrained Molecular Generation using Contractive and Margin losses

Resources

License

Stars

Watchers

Forks

Packages

No packages published