Skip to content

How To Add An Experiment

Natalie Prange edited this page Sep 19, 2024 · 4 revisions

Adding an experiment to the webapp requires two steps:

  1. Linking benchmark articles
  2. Evaluating the linking results

Both steps are explained in the following.

Linking Benchmark Articles

You can link benchmark articles in three different ways

  1. You can use a linker that is included in ELEVANT
  2. You can use a NIF API as is used for GERBIL
  3. You can use existing linking results and transform them into ELEVANT's internally used file format

Using an Included Linker

To link the articles of a benchmark with a linker included in ELEVANT, you can use the script link_benchmark.py:

python3 link_benchmark.py <experiment_name> -l <linker_name> -b <benchmark_name>

The linking results will be written to evaluation-results/<linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl where <adjusted_experiment_name> is <experiment_name> in lowercase and characters other than [a-z0-9-] replaced by _.

You can provide multiple benchmark names at once, e.g -b kore50 msnbc reuters-128 or even link all benchmarks in the benchmarks directory at once using -b ALL

Properties specific to the selected linker such as confidence thresholds, model paths, API URLs etc. are read from the linker's config file at configs/<linker_name>.config.json.

Additionally, this will create a file evaluation-results/<linker_name>/<adjusted_experiment_name>.<benchmark_name>.metadata.jsonl that contains metadata information such as an experiment description and the experiment name which will be displayed in the ELEVANT webapp. The description can be specified using the -desc argument. Per default, the description from the linker's config file is used. You can adjust the description and experiment name in the metadata file at any time.

For a list of linkers included in ELEVANT, see Linkers.

You can also integrate your own linker into ELEVANT, see Integrating a Linker.

Using a NIF API

If you implemented a NIF API for your linker as is needed to evaluate a linker using GERBIL, you can use that same NIF API to link benchmark articles with ELEVANT. To do so, use the -api option of the link_benchmark.py script:

python3 link_benchmark.py <experiment_name> -api <api_url> -pname <linker_name> -b <benchmark_name>

You can provide multiple benchmark names at once.

Each benchmark article text will then be sent as NIF context in a separate HTTP POST request to your NIF API. The API should return the linking results in NIF format. See the GERBIL documentation for more information on how to implement a NIF API.

The linking results will be written to evaluation-results/<adjusted_linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl where <adjusted_name> is <name> in lowercase and characters other than [a-z0-9-] replaced by _. If the -pname option is omitted, <adjusted_linker_name> is unknown_linker.

Using Existing Linking Results

If you already have linking results for a certain benchmark that you want to evaluate with ELEVANT, you can use the link_benchmark.py script to convert your linking results into the JSONL format used by us. This works if the text of the benchmark you linked corresponds to the text of one of the benchmarks in the benchmarks directory and if your linking results are in one of the two formats:

  1. NLP Interchange Format (NIF)
  2. a very simple JSONL format

The formats are explained in detail in Linking Result Formats.

If you don't want to use any of the supported formats you can write your own prediction reader, as explained in Writing a Custom Prediction Reader.

The script call to convert linking results into our format is

python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat <nif|simple-jsonl> -pname <linker_name> -b <benchmark_name>

The converted linking results will be written to evaluation-results/<adjusted_linker_name>/<adjusted_experiment_name>.<benchmark_name>.linked_articles.jsonl where <adjusted_linker_name> and <adjusted_experiment_name> are lowercased versions of <linker_name> and <experiment_name> with characters other than [a-z0-9-] replaced by _. If the -pname option is omitted, <adjusted_linker_name> is unknown_linker.

Evaluating Linking Results

To evaluate a linker's predictions use the script evaluate.py:

python3 evaluate.py <path_to_linking_result_file>

You can provide multiple linking result files at once (for example, python3 evaluate.py evaluation-results/*/*.linked_articles.jsonl) which saves a lot of time since the mappings needed for the evaluation have to be loaded only once (loading takes about 30s).

This will print precision, recall and F1 scores and create two new files where the .linked_articles.jsonl file extension is replaced by .eval_cases.jsonl and .eval_results.json respectively. For example

python3 evaluate.py evaluation-results/baseline/baseline.kore50.linked_articles.jsonl

will create the files evaluation-results/baseline/baseline.kore50.eval_cases.jsonl and evaluation-results/baseline/baseline.kore50.eval_results.json. The eval_cases file contains information about each true positive, false positive and false negative case. The eval_results file contains the scores that are shown in the web app's evaluation results table.

In the web app, simply reload the page (you might have to disable caching) and the experiment will show up as a row in the evaluation results table for the corresponding benchmark.

Removing an Experiment

If you want to remove an experiment from the web app, simply (re)move the corresponding .linked_articles.jsonl, .metadata.json .eval_cases.jsonl and .eval_results.json files from the evaluation-results/<linker_name>/ directory and reload the web app (disabling caching).