Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

General Installation

You need to install a MongoDB v4.2.9 Server somewhere. All the conversation data is stored there. Unzip data_dump/MongoDump.zip and then import the files into your MongoDB (repeat this process for all 9 files):

mongoimport --db auto_judge_final --collection annotated-dialogues-full-convai2 --file annotated-dialogues-full-convai2.json --jsonArray --username <user_name>  --password <pw>

You need to install R...

You need to install Python 3.7, we suggest that you use Anaconda:

$ conda env create -f environment.yml

Adapt the config/annotation_app.json file as follows:

{
    "host": "ip_address of your MongoDB Server",
    "port": "port of mongodb",
    "user": "mongodb user name",
    "password": "pw of mognodb user",
    "database_name": "auto_judge_final",
    "package_collection_name": "packed-dialogues-full-{domain_name}",
    "sampled_collection_name": "sampled-dialogues-full-{domain_name}",
    "labelled_collection_name": "annotated-dialogues-full-{domain_name}",
    "local_port": 5003,
    "max_package_per_user": 3
}

Run the Annotation Tool

After you cloned the repository cd/autojudge_annotaiton:

To run the annotation tool:

$ python run.py

You can access the tool at localhost:5003

Ranking

After you cloned the repository cd/autojudge_annotaiton:

To get the Rankings based on Bootstrap Sampling (Table 1):

$ python templates\src\segment_analysis\segmented_bootstrap_sampling.py

To get the pairwise win rates (Table 1):

$ python templates\src\segment_analysis\win_function.py

To perform the stability experiment (Figure 3a):

$ python templates\src\segment_analysis\ranking_significance.py

To perform the leave-one-out experiment (Figure 3b):

$ python templates\src\segment_analysis\ranking_significance.py -lo 1

Survival Analysis

The survival analysis is implemented in R and uses the following packages:

survival
survminer (needs a fortran compiler to install)
glrt
icenReg

To export the survival data from your annotations run python -m analysis.extract_event_data. This will create a csv file event_data.csv which is read by the R script.

Finally, run the R script at analysis/survival.R.

IAA

To run the label agreement analysis on e.g. the convai2 annotations, run

$ python analysis/inter_annotator_agreement.py sampled-dialogues-full-convai2.json

The annotations are stored in data_dump/MongoDump.zip

References

If you use this code, please cite us:

@inproceedings{deriu2020spot_the_bot,
  title = {{Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems}},
  author = {Deriu, Jan and Tuggener, Don and von D{\"a}niken, Pius and Campos, Jon Ander and Rodrigo, Alvaro and, Belkacem, Thiziri and Soroa, Aitor and Agirre, Eneko and Cieliebak, Mark},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  address = {Online},
  year = {2020},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

General Installation

Run the Annotation Tool

Ranking

Survival Analysis

IAA

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

General Installation

Run the Annotation Tool

Ranking

Survival Analysis

IAA

References