Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 1.92 KB

README.md

File metadata and controls

36 lines (24 loc) · 1.92 KB

This repository is an implementation an RNNLM using Tensorflow (r1.0).

Usage

Preparing data

We have a script to download and preprocess public LM dataset. Please see shell script files in data. For other corpus, you need to prepare train.txt, valid.txt, and test.txt and run the main preprocessing file in preprocessing module.

Training

You can train a langauge model with default option with:

python run_lm.py --training --save_config_file train_config.json

It will create a directory experiments and save all checkpoints and logs in the directory. By default, the script will use LSTM cell and train on PTB dataset. For other option, please add --help option.

Testing

The same file can also be used for testing. To reuse the configuration file by passing --load_config_filepaht and override the configuration by provding new ones. For example

python run_lm.py --load_config_filepaht experiments/out/train_config.json --no-training

Extending

There are many levels of modification in the code.

TODO

  • Support other cell types: add commandline argurment and improve feed_state(.)
  • Provide decoder interface
  • Change to Tensorflow r1.1 (contrib.rnn is no longer supported)