Toxisity prediction using latent representation of SMILES.
This project requires the following libraries.
- NumPy
- Pandas
- PyTorch > 1.2
- tqdm
- RDKit
Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.
Toxicity data from https://www.kaggle.com/datasets/fanconic/smiles-toxicity were used.
Pre-trained model is here.
See experiments/
for the example codes.