This repository contains the source code of our emnlp 2022 paper. IM2: an Interpretable and Multi-category Integrated Metric Framework for Automatic Dialogue Evaluation
Create virtural environment:
conda create -n im2 python=3.6
source activate im2
Install the required packages:
pip install -r requirements.txt
we provide the processed data,train/valid/test datasets,all processed dstc10 datasets.
datc10 data : ./dstc10-split-by-dialog-score
ab-ac and ab-ba train data : ./dailydialog
test data : /test_data_anno
the checkpoint is provided,contains fne-tuned dialogpt, Roberts, and ab-ba,ab-bc for IM2,etc..
cd ./ckpt
You could download it and unzip
You can either use your own data or the data that we have processed, use all sub-metric to get the score, and then get the combined IM2 score.
use each model and loading its corresponding ckpt,get the sub-metric score
NUF = w1 ∗ LSC + w2 ∗ V UP + w3 ∗ 5-NUF
CR = w4 ∗ GRADE + w5 ∗ AB-AC +w6 ∗ AB-BA
IES = w7 ∗ Dist-n +w8 ∗ D-MLM +w9 ∗ 5-IES
IM2 = α1 ∗ NUF + α2 ∗ CR + α3 ∗ IES