-
Notifications
You must be signed in to change notification settings - Fork 1
How To Add An Experiment
Adding an experiment to the webapp requires two steps:
Both steps are explained in the following.
You can link benchmark articles in three different ways
- You can use a linker that is included in ELEVANT
- You can use a NIF API as is used for GERBIL
- You can use existing linking results and transform them into ELEVANT's internally used file format
To link the articles of a benchmark with a linker included in ELEVANT, you can use the script link_benchmark.py
:
python3 link_benchmark.py <experiment_name> -l <linker_name> -b <benchmark_name>
The linking results will be written to
evaluation-results/<linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl
where
<adjusted_experiment_name>
is <experiment_name>
in lowercase and characters other than [a-z0-9-]
replaced by
_
.
You can provide multiple benchmark names at once, e.g -b kore50 msnbc reuters-128
or even link all benchmarks in the benchmarks
directory at once using -b ALL
Properties specific to the selected linker such as confidence thresholds, model paths, API URLs etc. are read from the linker's
config file at configs/<linker_name>.config.json
.
Additionally, this will create a file
evaluation-results/<linker_name>/<adjusted_experiment_name>.<benchmark_name>.metadata.jsonl
that contains metadata
information such as an experiment description and the experiment name which will be displayed in the ELEVANT
webapp. The description can be specified using the -desc
argument. Per default, the description from the linker's
config file is used. You can adjust the description and experiment name in the metadata file at any time.
For a list of linkers included in ELEVANT, see Linkers.
You can also integrate your own linker into ELEVANT, see Integrating a Linker.
If you implemented a NIF API for your linker as is needed to evaluate a linker using GERBIL, you can use that same
NIF API to link benchmark articles with ELEVANT. To do so, use the -api
option of the link_benchmark.py
script:
python3 link_benchmark.py <experiment_name> -api <api_url> -pname <linker_name> -b <benchmark_name>
You can provide multiple benchmark names at once.
Each benchmark article text will then be sent as NIF context in a separate HTTP POST request to your NIF API. The API should return the linking results in NIF format. See the GERBIL documentation for more information on how to implement a NIF API.
The linking results will be written to
evaluation-results/<adjusted_linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl
where
<adjusted_name>
is <name>
in lowercase and characters other than [a-z0-9-]
replaced by _
.
If the -pname
option is omitted, <adjusted_linker_name>
is unknown_linker
.
If you already have linking results for a certain benchmark that you want to evaluate with ELEVANT, you can use the
link_benchmark.py
script to convert your linking results into the JSONL format used by us. This works if
the text of the benchmark you linked corresponds to the text of one of the benchmarks in the benchmarks
directory
and if your linking results are in one of the two formats:
- NLP Interchange Format (NIF)
- a very simple JSONL format
The formats are explained in detail in Linking Result Formats.
If you don't want to use any of the supported formats you can write your own prediction reader, as explained in Writing a Custom Prediction Reader.
The script call to convert linking results into our format is
python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat <nif|simple-jsonl> -pname <linker_name> -b <benchmark_name>
The converted linking results will be written to
evaluation-results/<adjusted_linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl
where
<adjusted_linker_name>
and <adjusted_experiment_name>
are lowercased versions of <linker_name>
and
<experiment_name>
with characters other than [a-z0-9-]
replaced by _
. If the -pname
option is omitted,
<adjusted_linker_name>
is unknown_linker
.
To evaluate a linker's predictions use the script evaluate.py
:
python3 evaluate.py <path_to_linking_result_file>
You can provide multiple linking result files at once (for example, python3 evaluate.py evaluation-results/*/*.linked_articles.jsonl
) which saves a lot of time since the mappings needed for the evaluation have to be loaded only once (loading takes about 30s).
This will print precision, recall and F1 scores and create two new files where the .linked_articles.jsonl
file
extension is replaced by .eval_cases.jsonl
and .eval_results.json
respectively. For example
python3 evaluate.py evaluation-results/baseline/baseline.kore50.linked_articles.jsonl
will create the files evaluation-results/baseline/baseline.kore50.eval_cases.jsonl
and
evaluation-results/baseline/baseline.kore50.eval_results.json
. The eval_cases
file contains information about
each true positive, false positive and false negative case. The eval_results
file contains the scores that are shown
in the web app's evaluation results table.
In the web app, simply reload the page (you might have to disable caching) and the experiment will show up as a row in the evaluation results table for the corresponding benchmark.
If you want to remove an experiment from the web app, simply (re)move the corresponding .linked_articles.jsonl
, .metadata.json
.eval_cases.jsonl
and .eval_results.json
files from the evaluation-results/<linker_name>/
directory and reload
the web app (disabling caching).