GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

This repository contains the source code for the following paper:

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

Model Overview

Prerequisites

Create virtural environment (recommended):

conda create -n GRADE python=3.6
source activate GRADE

Install the required packages:

pip install -r requirements.txt

Install Texar locally:

cd texar-pytorch
pip install .

Note: Make sure that your environment has installed cuda 10.1.

Data Preparation

GRADE is trained on the DailyDialog Dataset proposed by (Li et al.,2017).

For convenience, we provide the processed data of DailyDialog. And you should also download it and unzip into the data directory. And you should also download tools and unzip it into the root directory of this repo.

If you wanna prepare the training data from scratch, please follow the steps:

Install Lucene;
Run the preprocessing script:

cd ./script
bash preprocess_training_dataset.sh

Training

To train GRADE, please run the following script:

cd ./script
bash train.sh

Note that the checkpoint of our final GRADE is provided. You could download it and unzip into the root directory.

Evaluation

We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the evaluation directory has the following file structure:

.
└── evaluation
    └── eval_data
    |   └── DIALOG_DATASET_NAME
    |       └── DIALOG_MODEL_NAME
    |           └── human_ctx.txt
    |           └── human_hyp.txt
    └── human_score
        └── DIALOG_DATASET_NAME
        |   └── DIALOG_MODEL_NAME
        |       └── human_score.txt
        └── human_judgement.json

Note: the entire human judgement data we proposed for metric evaluation is in human_judgement.json.

To evaluate GRADE, please run the following script:

cd ./script
bash eval.sh

Using GRADE

To use GRADE on your own dialog dataset:

Put the whole dataset (raw data) into ./preprocess/dataset;
Update the function load_dataset in ./preprocess/extract_keywords.py for loading the dataset;
Prepare the context-response data that you want to evaluate and convert it into the following format:

.
└── evaluation
    └── eval_data
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── human_ctx.txt
                └── human_hyp.txt

Run the following script to evaluate the context-response data with GRADE:

cd ./script
bash inference.sh

Lastly, the scores given by GRADE can be found as below:

.
└── evaluation
    └── infer_result
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── non_reduced_results.json
                └── reduced_results.json

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
evaluation		evaluation
images		images
model/evaluation_model/GRADE		model/evaluation_model/GRADE
preprocess		preprocess
script		script
texar-pytorch		texar-pytorch
utils		utils
.gitignore		.gitignore
README.md		README.md
extract_kw.sh		extract_kw.sh
main_for_metric_grade.py		main_for_metric_grade.py
main_grade.py		main_grade.py
requirements.txt		requirements.txt
setting.py		setting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Model Overview

Prerequisites

Data Preparation

Training

Evaluation

Using GRADE

About

Releases

Packages

Contributors 2

Languages

li3cmz/GRADE

Folders and files

Latest commit

History

Repository files navigation

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Model Overview

Prerequisites

Data Preparation

Training

Evaluation

Using GRADE

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages