This is the repository for MolGrapher: Graph-based Visual Recognition of Chemical Structures.
If you find this repository useful, please consider citing:
@InProceedings{Morin_2023_ICCV,
author = {Morin, Lucas and Danelljan, Martin and Agea, Maria Isabel and Nassar, Ahmed and Weber, Valery and Meijer, Ingmar and Staar, Peter and Yu, Fisher},
title = {MolGrapher: Graph-based Visual Recognition of Chemical Structures},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {19552-19561}
}
Publication in ICCV (DOI: https://doi.org/10.1109/iccv51070.2023.01791)
Publication in Arxiv (DOI: https://doi.org/10.48550/arXiv.2308.12234)
Create a virtual environment.
conda create -n molgrapher python=3.11
conda activate molgrapher
Install MolGrapher and MolDepictor for CPU.
pip install -e .["cpu"]
Install MolGrapher and MolDepictor for GPU. (Tested for x86_64, Linux Ubuntu 20.04, CUDA 11.7, CUDNN 8.4)
pip install -e .["gpu"]
CUDA and CDNN versions can be edited in setup.py
.
To install and run MolGrapher using Docker, please refer to README_DOCKER.md.
Models are available on Hugging Face.
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_gcn_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_no_stereo_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/graph_classifier/gc_stereo_model.ckpt -P ./data/models/graph_classifier/
wget https://huggingface.co/ds4sd/MolGrapher/resolve/main/models/keypoint_detector/kd_model.ckpt -P ./data/models/keypoint_detector/
After downloading, the folder models
from Hugging Face should be placed in: ./data/
.
Models can be selected by modifying attributes of GraphRecognizer (in ./molgrapher/models/graph_recognizer.py
).
Your input images can be placed in the folder: ./data/benchmarks/default/
.
bash molgrapher/scripts/annotate/run.sh
Output predictions are saved in: ./data/predictions/default/
.
USPTO-30K is available on Hugging Face.
- USPTO-10K contains 10,000 clean molecules, i.e. without any abbreviated groups.
- USPTO-10K-abb contains 10,000 molecules with superatom groups.
- USPTO-10K-L contains 10,000 clean molecules with more than 70 atoms.
The synthetic dataset is available on Hugging Face. Images and graphs are generated using MolDepictor.
To train the keypoint detector:
python3 ./molgrapher/scripts/train/train_keypoint_detector.py
To train the node classifier:
python3 ./molgrapher/scripts/train/train_graph_classifier.py