TNE Evaluator

This script evaluates predictions for the Text-based NP Enrichment dataset against correct answers and produces multiple scores.

Example

% python3 evaluate.py \\
          --predictions_file predictions.jsonl \\ 
          --gold_file test.jsonl \\
          --output_file metrics.json

% cat metrics.json
{"links-p": 0.5, "links-r": 0.5, "link-f1": 0.5, "identified_prep_acc": 0.9, "non_identified_prep_acc": 0.3, "micro-f1": 0.4}

Usage

The script takes two input files and produces one output file.

Input predictions

A prediction file has the document ids, index of the predicted relation (0 is for no-relation) in a JSONL format. For example:

% cat predictions.csv
{'prepositions': [[0], [1], [2], [3], [4]], 'links': [0, 1, 1, 1, 1]}
{'prepositions': [[0], [1], [2], [3], [4]], 'links': [0, 1, 2, 3, 4]}

(Other attributes will be ignored)

Input answers

A predictions file that has predictions in JSONL format. For example:

% cat questions.jsonl
{"id": 1, 'links': [-1, 0, 1, -1], 'prepositions': [1, 1, 2, 3, 4]}
{"id": 2, 'links': [-1, 0, 1, 1, -1, 0, 0, 1, -1], 'prepositions': [[0], [0], [2, 3], [4, 2], [0], [0], [0], [1], [0]}}

Output metrics

A JSON file that has the different metrics we use in the range 0.0 to 1.0. For example:

% cat metrics.json 
{"labeled_p": 0.5, "labeled_r": 0.5, "labeled_f1": 0.5, "unlabeled_p": 0.9, "unlabeled_r": 0.3, "unlabeled_f1": 0.4}

The results here are invented, and do not represent the scoring functions

Development

Unit tests

Run unit tests with python3 test_evaluator.py.

Docker

Ultimately this evaluator is run in a Docker container. To test that it works there, run test.sh.

Publishing

To build and publish a Beaker image as the Leaderboard user, use the script publish_for_leaderboard.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
test_files		test_files
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
convert_allennlp_file2leaderboard.py		convert_allennlp_file2leaderboard.py
convert_data2leaderboard_eval.py		convert_data2leaderboard_eval.py
evaluate.py		evaluate.py
mcf1_measure.py		mcf1_measure.py
publish_for_leaderboard.sh		publish_for_leaderboard.sh
requirements.txt		requirements.txt
test-in-docker.sh		test-in-docker.sh
test.sh		test.sh
test_evaluator.py		test_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TNE Evaluator

Example

Usage

Input predictions

Input answers

Output metrics

Development

Unit tests

Docker

Publishing

About

Releases

Packages

Contributors 2

Languages

allenai/tne-evaluator

Folders and files

Latest commit

History

Repository files navigation

TNE Evaluator

Example

Usage

Input predictions

Input answers

Output metrics

Development

Unit tests

Docker

Publishing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages