Skip to content

Language model that works with multiple domains of data. (deprecated)

Notifications You must be signed in to change notification settings

northanapon/adaptive_lm

 
 

Repository files navigation

This repository is an implementation an RNNLM using Tensorflow (r1.0).

Usage

Preparing data

We have a script to download and preprocess public LM dataset. Please see shell script files in data. For other corpus, you need to prepare train.txt, valid.txt, and test.txt and run the main preprocessing file in preprocessing module.

Training

You can train a langauge model with default option with:

python run_lm.py --training --save_config_file train_config.json

It will create a directory experiments and save all checkpoints and logs in the directory. By default, the script will use LSTM cell and train on PTB dataset. For other option, please add --help option.

Testing

The same file can also be used for testing. To reuse the configuration file by passing --load_config_filepaht and override the configuration by provding new ones. For example

python run_lm.py --load_config_filepaht experiments/out/train_config.json --no-training

Extending

There are many levels of modification in the code.

TODO

  • Support other cell types: add commandline argurment and improve feed_state(.)
  • Provide decoder interface
  • Change to Tensorflow r1.1 (contrib.rnn is no longer supported)

About

Language model that works with multiple domains of data. (deprecated)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.1%
  • Shell 4.9%