This repo contains the evaluation scripts needed to replicate the IWSLT 2024 speech translation tasks for Quechua to Spanish.
This is a Python script for evaluating the performance of speech translation systems using the BLEU and chrF metrics. The script takes as input a reference text file and a folder containing the hypothesis text files. It processes each hypothesis file and outputs the results in a tab-separated values (TSV) file.
- Python 3.x
- pandas
- bleu_scorer
- chrF_scorer
- Clone the repository
- Navigate to the repository directory
- Install the dependencies
git clone https://github.com/Llamacha/iwslt24_que_esp
cd iwslt24_que_esp
pip install -r requirements.txt
- Ensure that your reference file and hypothesis files are named correctly.
- Open a terminal and navigate to the repository directory.
- Run the script with the following command:
python main.py --ref /path/to/reference/file --phyp /path/to/hypotheses/folder
- Wait for the script to finish processing all the hypothesis files.
- Find the results in a TSV file named results.tsv in the hypothesis folder.
The reference file is a plain text file containing the ground truth translations for each source sentence. Each line in the file represents one sentence, and each sentence should be separated by a newline character.
The hypotheses folder should contain one or more plain text files, each containing the translations generated by a speech translation system. Each file should be named in the following format: {team_id}.st.{condition}.primary.que-spa.txt. Here, {team_id} is a unique identifier for the team that generated the translations, {condition} is either constrained or unconstrained, and primary is the name of the translation type.
The output file results.tsv contains the following columns:
- Participant: the unique identifier for each team that generated the translations.
- Condition: the condition under which the translations were generated (constrained or unconstrained).
- Type: the name of the translation type (primary, contrastive1, or contrastive2).
- BLEU: the BLEU score for each set of translations.
- chrF: the chrF score for each set of translations.
This project is licensed under the MIT License - see the LICENSE file for details.