Update 5/14/2023: The results for the Shared Task can now be found here. Thanks to all the teams for their submissions!
Update 4/21/2023: The surprise language -- Chatino -- has been released! It is now available under data/chatino-spanish
. In this directory, you can find the train and dev files, along with another file, ctp-eng.tsv
which contains parallel data between another variant of Chatino and English. We hope that this additional data, although it is from a different domain, will be helpful in improving the Chatino--Spanish translations.
For the 2023 Shared Task, Spanish (or another high-resource language) will be used as the source language, and model outputs should be in the target Indigenous language.
- This year's shared task will be similar to Track 2 of the 2021 ST: training on the development set is not allowed.
- Using the AmericasNLI test set for hyperparameter tuning or any form of decision making is not allowed.
- Evaluation will be done using the
evaluate.py
script. The final order of teams will be selected using average ChrF across all languages.
This year's baseline is the best performing system from the 2021 AmericasNLP Shared Task, particularly the B-0dev model. The repository for this model can be found here. Baseline performance for the system is described, per-language, below:
ISO | Language | ChrF |
---|---|---|
aym | Aymara | 0.283 |
bzd | Bribri | 0.165 |
cni | Asháninka | 0.258 |
gn | Guarani | 0.336 |
hch | Wixarika | 0.304 |
nah | Nahuatl | 0.266 |
oto | Otomí | 0.147 |
quy | Quechua | 0.343 |
shp | Shipibo-Konibo | 0.329 |
tar | Rarámuri | 0.184 |