Skip to content

m19eremeeva/project_NN

 
 

Repository files navigation

SMILES Transformer

Toxisity prediction using latent representation of SMILES.

Requirement

This project requires the following libraries.

  • NumPy
  • Pandas
  • PyTorch > 1.2
  • tqdm
  • RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Toxicity data from https://www.kaggle.com/datasets/fanconic/smiles-toxicity were used.

Pre-training

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.7%
  • Jupyter Notebook 49.3%