Recurrent-Autoencoder

This program is an auto-encoder with RNN. It is a kind of unsupervised learning, what we want is the latent vector produced by the encoder. We can use it to transform a variational length sequence into a fixed length vector. Specifically, we want to do that for SMILES code which is a chemical expression of molecules.

Environment

python2 or python3
tensorflow
progressbar

Prepare data

There is a dictionary.txt with 256 words in it. You can use gen-moles.py to generate sentences based on the words in the library. Use python gen-moles.py -h for more information about how to use it. You can change maximum atom length and number of sentences if you want. And it gives back a '.txt' file which wil be the input file of our neural net. Sentences are like this:

C C C N C O N O C N C C N N O O N N C C C O N N N N N C N O O C N C CO CO C#C C#N CO C=O C#C C#C CO C#N C=O C#C C#C CO C=O C#C C1CC1 C1CC1 CC#C CCC CC#N OCC=O COC(=O)N CCOC=O CC(=NO)C NC(=N)C#N OC1COC1 N=CNC=O O=CNC=O CCCOC CCC#CC o1nnnn1 CC(C)C#C FC(F)(F)F OC1COC1 o1nnnn1 NC(=O)C#C Nc1cocc1 CC(C)(C)C=O C/C(=N\O)/C#C [nH]1ccnc1 O[C@H]1C[C@H]1O O=c1[nH]cc[nH]1 CC(=O)O[CH][NH] [NH][C@H]1OC=CO1 [NH][C@@H]1NC=CO1

Now use the prepare-data.py to translate the input file into machine learning data. Use python prepare-data.py -h for more information about how to use it. The output will be in the directory you gave it, there are training data, validation data and the dictionary.

Train it

Basically all the files related with the neural net are in neural-net directory. Use train-autoencoder.py for training. you can adjust parameters if you want. Use python train-autoencoder.py -h for more information about how to use it.

Test it

Use interactive.py to test if the neural net can give the same thing as your input. After opening this file, you can see a interactive environment, you can type in your sentence, after "in:" shows up. And the result will automatically shows up after "ou:". Use python interactive.py -h
Use codify-sentences.py to translate molecule sentences into vectors. You need a file contains several molecule sentences in it. And it will give you a '.npy' file which contains state arrays in it.

Reference

https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
neural-net		neural-net
README.md		README.md
dictionary.txt		dictionary.txt
gen-moles.py		gen-moles.py
gen_data.py		gen_data.py
progress.py		progress.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recurrent-Autoencoder

Environment

Prepare data

Train it

Test it

Reference

About

Releases

Packages

Languages

zhang-yu-wei/Recurrent-Autoencoder

Folders and files

Latest commit

History

Repository files navigation

Recurrent-Autoencoder

Environment

Prepare data

Train it

Test it

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages