Entry for machine learning tutorial How to Make a Chatbot - Intro to Deep Learning #12
- python (>=3.5) - Developed on 3.5. Untested on 2.7, 3.4 or 3.6, but ought to work on v. >=3.5
- pandas (>=0.19) Needed for some data sorting operations
- keras (2.0.2/2.0.3) - Deep learning FTW
- tensorflow (1.0.1) - My preferred Keras backend. Tensorboard is now integrated, so it'll need some tweaking to be Theano-compatible again
- h5py - for model checkpointing and saving weights
- keras-tqdm - because my Jupyter notebooks freezes on the default Keras progbar. Also, it's awesome.
To run the command line interface, just type:
python main.py
.
I have provided pretrained weights for the first model, challenge 1 (default location ./models/c1/dmn00.hdf5
). If this is the first time you are running the program with flags, or you just created a new model, you'll have to train it first, which you can do directly from the menu. If you do not have any trained models yet, you can select f
from the menu to fit the model.
-m {modelname}
- Set the name of the model and weight save file-c {N}
- Run challenge mode N.1
is Single context bAbI,2
is Double context bAbI. You can now select from any of the 20 bAbI Q/A tasks!-a {N}
- Run architecture N.1
is bAbI DMN,2
a more conventional convolutional LSTM (warning: hard on memory)-v
- Verbose flag
Example uses:
python main.py -c 2
to switch to double supporting facts dataset
python main.py -m modelname.hdf5
to specify a custom model name. Note that the software automatically places these in the folders models/c1/
or models/c2/
depending on the dataset.
If you want to suppress some of the TF notifications and the progbars, you can append 2> /dev/null
to redirect that junk.
There are actually two challenges that came with the Q/A task, the single supporting fact, and the double supporting facts. The former is pretty easy to knock out of the park, while the latter has proven quite stubborn. I was able to get >95% training accuracy but only 35-40% validation accuracy, a surefire sign of overfitting. I tried some clever hacks with the network but I was not able to improve results. The authors claim that they aced the two supporting fact problem, but the Keras code as provided seems to fall short. Meh.
Here are some improvements I made to the demo network:
I added the option to compare against a convolutional LSTM architecture. So far, kind of middling results. Needs to be configured for minibatch.
The single forward-pass LSTM was converted to bidirectional layer with the Bidirectional wrapper. Yuuuuuge improvement on double-context task - 84.7% (training acc) after 260 epochs with single, improved to 90% after only 110 epochs with bidirectional. Nice! Asymptoted to 95% after about 150 epochs. However, I later realized these figures were pretty misleading, as the validation was not keeping pace with the training accuracy.
The Single-context task got to 90% validation accuracy after 60 vs 85 epochs, modest improvement.
Adding a TDD layer before the LSTM gave an additional jump in terms of training time and overall accuracy, reaching 95% valacc after 65 epochs on single-context (with default 32 nodes).
Who doesn't love convo layers? Hoping to get better context recognition, I put a convolayer after the Match dot product part of the network. It didn't hurt the performance, but it didn't give the gains in the Challenge 2 I was looking for.
Adding a forward pass after the bidirectional pair did not give improvements, in fact it caused the network to stall out around 55%. I've seen towers of LSTMs used to good effect in other NLP papers. Maybe they have some secret sauce I don't.
The main menu looks like this:
------------------------------
Sandra went back to the kitchen.
Sandra journeyed to the garden.
Mary went back to the kitchen.
Sandra went to the kitchen.
------------------------------
..: Back
1: Load Random Story
2: Query
3: Query (loop)
f: Fit for N epochs
q: Quit
Enter menu selection:
The currently loaded story is shown at the top. Enter letters to navigate the menu.
..
is currently non-functional1
loads a new story.2
lets you type in a query. It goes back to the main menu after3
like 2, but brings you back to the query prompt after, for conveniencef
lets you enter in a number to fit the model for that many epochs.