IWSLT 2024 Dialectal and Low-Resource Speech Translation Task

This repo contains the evaluation scripts needed to replicate the IWSLT 2024 speech translation tasks for Quechua to Spanish.

IWSLT 2024 Dialectal and Low-Resource Speech Translation Task

This is a Python script for evaluating the performance of speech translation systems using the BLEU and chrF metrics. The script takes as input a reference text file and a folder containing the hypothesis text files. It processes each hypothesis file and outputs the results in a tab-separated values (TSV) file.

IWSLT 2024 task homepage

Requirements

Python 3.x
pandas
bleu_scorer
chrF_scorer

Installation

Clone the repository
Navigate to the repository directory
Install the dependencies

git clone https://github.com/Llamacha/iwslt24_que_esp
cd iwslt24_que_esp
pip install -r requirements.txt

Usage

Ensure that your reference file and hypothesis files are named correctly.
Open a terminal and navigate to the repository directory.
Run the script with the following command:

python main.py --ref /path/to/reference/file --phyp /path/to/hypotheses/folder

Wait for the script to finish processing all the hypothesis files.
Find the results in a TSV file named results.tsv in the hypothesis folder.

Reference File

The reference file is a plain text file containing the ground truth translations for each source sentence. Each line in the file represents one sentence, and each sentence should be separated by a newline character.

Hypotheses Folder

The hypotheses folder should contain one or more plain text files, each containing the translations generated by a speech translation system. Each file should be named in the following format: {team_id}.st.{condition}.primary.que-spa.txt. Here, {team_id} is a unique identifier for the team that generated the translations, {condition} is either constrained or unconstrained, and primary is the name of the translation type.

Results

The output file results.tsv contains the following columns:

Participant: the unique identifier for each team that generated the translations.
Condition: the condition under which the translations were generated (constrained or unconstrained).
Type: the name of the translation type (primary, contrastive1, or contrastive2).
BLEU: the BLEU score for each set of translations.
chrF: the chrF score for each set of translations.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
files		files
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IWSLT 2024 Dialectal and Low-Resource Speech Translation Task

Requirements

Installation

Usage

Reference File

Hypotheses Folder

Results

License

About

Releases

Packages

Contributors 2

Languages

Llamacha/iwslt24_que_esp

Folders and files

Latest commit

History

Repository files navigation

IWSLT 2024 Dialectal and Low-Resource Speech Translation Task

Requirements

Installation

Usage

Reference File

Hypotheses Folder

Results

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages