Enhanced word representation for Out-of-Vocabulary on Ubuntu Dialogue Corpus

This is a TensorFlow implementation of the model described in:

Jianxiong Dong, Jim Huang Enhance Word Representation For Out-of-Vocabulary on Ubuntu Dialogue Corpus.

The model has acheived the state-of-the-art performane on Ubuntu Dialogue Corpus V2 and Douban Chinese dialogue corpus.

Contact

Code author: Jianxiong Dong

Requirements

Install the Tensorflow library (instructions). For example:

virtualenv --system-site-packages tensorfow_dev
source tensorflow_dev/bin/activate
pip install --upgrade pip
pip install tensorflow-gpu==1.4.0

16GB of RAM. 32GB is recommended.
A machine with NVIDIA GPU card (large GPU RAM) is preferable. It has been tested with NVIDIA Titan Xp (12G).

Dataset

We used Ubuntu Dialogue Corpus V2. In order to easily reproduce results in the above paper, the processed dataset has been provided.

cd data
sh download.sh

Training a model

Execute the following commands to start the training script. By default it will run for 230k steps to achieve maximum mean reciprocal rank on the validation set.

cd bin
nohup sh ubuntu_train.sh &

Evaluating a model

If several runs exist in 'runs' folder, the checkpoints of the latest run is used to evaluate the model performance.

cd bin
sh ubuntu_test.sh

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
answer_selection		answer_selection
bin		bin
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhanced word representation for Out-of-Vocabulary on Ubuntu Dialogue Corpus

Contact

Contents

Requirements

Dataset

Training a model

Evaluating a model

About

Releases

Packages

Languages

License

jdongca2003/next_utterance_selection

Folders and files

Latest commit

History

Repository files navigation

Enhanced word representation for Out-of-Vocabulary on Ubuntu Dialogue Corpus

Contact

Contents

Requirements

Dataset

Training a model

Evaluating a model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages