Conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher-order connections like multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNN) to predict molecular optoelectronic properties, where hyperedges represent conjugated structures.
- We'll use
conda
to install dependencies and set up the environment. We recommend using the Python 3.9 Miniconda installer. - After installing
conda
, installmamba
to the base environment.mamba
is a faster, drop-in replacement forconda
:conda install mamba -n base -c conda-forge
- Create a new environment named
mhnn
and install dependencies.mamba env create -f env.yml
- Activate the conda environment with
conda activate mhnn
.
Dataset | Graphs | Task type | Task number | Metric |
---|---|---|---|---|
OPV | 90,823 | regression | 8 | MAE |
OCELOTv1 | 25,251 | regression | 15 | MAE |
PCQM4Mv2 | 3,746,620 | regression | 1 | MAE |
The OPV dataset, named organic photovoltaic dataset, contains 90,823 unique molecules (monomers and soluble small molecules) and their SMILES strings, 3D geometries, and optoelectronic properties from DFT calculations. OPV has four molecular tasks, the energy of highest occupied molecular orbital for the monomer (
The OCELOTv1 dataset contains 25,251 organic
PCQM4Mv2 is a quantum chemistry dataset originally curated under the PubChemQC project. A meaningful ML task was defined to predict DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs. PCQM4Mv2 is unprecedentedly large (> 3.8M graphs) in scale comparing to other labeled graph-level prediction datasets.
-
We provide training scripts for
MHNN
andbaselines
underscripts/opv
. For example, we can trainMHNN
for one task by running:bash scripts/opv/mhnn.sh [TASK_ID]
-
Train a model for all tasks by running:
bash scripts/opv/run_all_tasks.sh [MODEL_NAME]
-
The OPV dataset will be downloaded automatically at the first time of training.
-
The model names and task ID for different tasks can be found here.
-
We provide training scripts for
MHNN
underscripts/ocelot
. For example, we can trainMHNN
for one task by running:bash scripts/ocelot/train.sh [TASK_ID]
-
Train
MHNN
for all tasks by running:bash scripts/ocelot/run_all_tasks.sh
-
The ocelot dataset will be downloaded automatically at the first time of training.
-
Task ID for different tasks can be found here.
-
We provide a training script for
MHNN
underscripts/pcqm4mv2
to trainMHNN
by running:bash scripts/pcqm4mv2/train.sh
-
The PCQM4Mv2 dataset will be downloaded automatically at the first time of training.
This work was supported as part of NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation.
If you find our work useful, please consider citing it:
@article{chen2024molecular,
author = {Chen, Junwu and Schwaller, Philippe},
title = "{Molecular hypergraph neural networks}",
journal = {The Journal of Chemical Physics},
volume = {160},
number = {14},
pages = {144307},
year = {2024},
doi = {10.1063/5.0193557},
url = {https://doi.org/10.1063/5.0193557},
}
If you have any question, welcome to contact me at:
Junwu Chen: junwu.chen@epfl.ch