This code implements javascript source code generation using deep recurrent neural networks(RNN) with Long Short-Term Memory(LSTM) cells. In a nutshell, our model takes text files of source code, minifies and "reads" them; then after being trained, generates sequences of source code. Currently 2 approaches are available:
-
Character Level Learning: This approach is based on Andrej Karpathy's char-rnn. It is modified to work with javascript repositories.
-
Labeled Character Learning: The first proposed approach consists of categorizing each character as one of eight simple classes(regex, keyword, string, number, operator, punctuator, identifier, other) . The model in this case takes both the character and its label as input and makes predictions for the next set.
This code is written in Python 2.7 using Keras with Theano as a backend.
As deep Recurrent Neural Networks are very expensive computationally you are advised to use a GPU to train them.
Various libraries are used to manipulate & evaluate the source code. If you have pip installed you can try installing them with a single command:
pip install -r requirements.txt
Creating the datasets is handled by preprocess.py
. It takes one the path to the JS projects as argument. It minifies, tags & shuffles the source code. Produced datasets are placed in data/input
.
python preprocess.py [path to root directory of projects]
To train the model you can call each implementation's script. The training scripts take one optional argument in case you want to start training from a previously trained model.
- Character:
python char-rnn.py [-r] [filepath to previous model]
- Labeled Character:
python labeled-char-rnn.py [-r] [filepath to previous model]
To sample the trained models, thus generating source code, you can call each implementation's sampling script. The sampling scripts take one mandatory argument and three optional ones.
- Character:
python sample.py [filepath to model] [-s] [primer text] [-t] [softmax temperature [0, 1]] [-l] [length to generate]
- Labeled Character:
python sample-labeled.py [filepath to model] [-s] [primer text] [-t] [softmax temperature [0, 1]] [-l] [length to generate]
MIT