Code for paper "Learning a Generative Model for Validity in Complex Discrete Structures" by David Janz, Jos van der Westhuizen, Brooks Paige, Matt Kusner and José Miguel Hernández-Lobato. Arxiv: https://arxiv.org/abs/1712.01664

Steps to reproduce results in table 4:

Install requirements. rdkit (chemical informatics, http://www.rdkit.org/docs/Install.html) can be tricky; we recommend using Anaconda. Note: the project uses pytorch 0.3.1, later versions are not backward compatible and will not run.
First prepare data by running python3 data data/zinc.smi
Generate samples from pretrained VAE for testing the grammar model. See optional) for training your own VAE. python3 vae --restore vae-pretrained --generate-samples
Train grammar_rnn model, the main paper contribution. This takes approximately ten minutes. python3 grammar_model

Results are displayed to terminal + plotted to tensorboard (default location /tmp/models)

Citation bibtex:

@article{janz2018learning, title={Learning a Generative Model for Validity in Complex Discrete Structures}, author={Janz, David and van der Westhuizen, Jos and Paige, Brooks and Kusner, Matt and Hern{'a}ndez-Lobato, Jos{'e} Miguel}, journal={International Conference on Learning Representations}, year={2018} }

Optional: Train your own character VAE for molecules. Enclosed implementation beats the original. Training takes a couple hours. If you get a cuda out of memory error, consider reducing --batch-size (defaults at 500). python3 vae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Files

readme.md

Latest commit

History

readme.md

File metadata and controls