SMILES Transformer

Toxisity prediction using latent representation of SMILES.

Requirement

This project requires the following libraries.

NumPy
Pandas
PyTorch > 1.2
tqdm
RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Toxicity data from https://www.kaggle.com/datasets/fanconic/smiles-toxicity were used.

Pre-training

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
README.md		README.md
build_corpus.py		build_corpus.py
build_vocab.py		build_vocab.py
dataset.py		dataset.py
enumerator.py		enumerator.py
main_scrypt.ipynb		main_scrypt.ipynb
names_labels.csv		names_labels.csv
names_smiles.csv		names_smiles.csv
pretrain_trfm.py		pretrain_trfm.py
trfm_12_23000.pkl		trfm_12_23000.pkl
utils.py		utils.py
vocab.pkl		vocab.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMILES Transformer

Requirement

Dataset

Pre-training

Downstream Tasks

About

Releases

Packages

Languages

m19eremeeva/project_NN

Folders and files

Latest commit

History

Repository files navigation

SMILES Transformer

Requirement

Dataset

Pre-training

Downstream Tasks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages