This repository contains documentation and scripts for Cometoid and Chrfoid metrics.
Our models are implemented in Marian NMT. We provide python bindings for Marian, which can be installed as follows:
git clone https://github.com/marian-nmt/marian-dev
# Pybind PR (github.com/marian-nmt/marian-dev/pull/1013) not merged at the time of writing;
# so checkout the branch
git checkout tg/pybind-new
# build and install -- run this from project root, i.e., dir having pyproject.toml
# simple case : no cuda, default compiler
pip install -v .
# optiona: using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .
# Option: with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install .
# Option: with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .
Tip
|
If CMAKE complains about missing C/C++ libraries, follow instructions at https://marian-nmt.github.io/quickstart/ and install them. |
The following reference-free/quality-estimation metrics are available:
-
chrfoid-wmt23
-
cometoid22-wmt23
-
cometoid22-wmt22
-
cometoid22-wmt21
pymarian-evaluate
is a convenient CLI that downloads and caches models, and runs them on the input data.
# download sample data
sacrebleu -t wmt21/systems -l en-zh --echo src > src.txt
sacrebleu -t wmt21/systems -l en-zh --echo Online-B > mt.txt
# chrfoid
paste src.txt mt.txt | pymarian-evaluate --stdin -m chrfoid-wmt23
# cometoid22-wmt22; similarly for other models
paste src.txt mt.txt | pymarian-evaluate --stdin -m cometoid22-wmt22
# Evaluator
from pymarian import Evaluator
marian_args = '-m path/to/model.npz -v path/to.vocab.spm path/to.vocab.spm --like comet-qe'
evaluator = Evaluator(marian_args)
data = [
["Source1", "Hyp1"],
["Source2", "Hyp2"]
]
scores = evaluator.run(data)
for score in scores:
print(score)
Tip
|
The pymarian-evaluate downloads and caches models under ~/.cache/marian/metrics/<model>
|
Models can ne downloaded from https://textmt.blob.core.windows.net/www/models/mt-metric/<model>.tgz
, where <model>
is one of the above (eg. cometoid22-wmt22
).
- UPDATE
-
Cometoid models are available on Hugginface hub at https://huggingface.co/collections/marian-nmt/cometoid-wmt23-metrics-66903bb137eadb9c5768d5f2
model_id=marian-nmt/cometoid22-wmt23
model=$(huggingface-cli download $model_id checkpoints/marian.model.bin)
vocab=$(huggingface-cli download $model_id vocab.spm)
# Score on CPU
N_CPU=6
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model -v $vocab $vocab \
-w 8000 --quiet --cpu-threads $N_CPU --mini-batch 1 --maxi-batch 1
# Score on GPUs; here "--devices 0 1 2 3" means use 4 GPUs
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model -v $vocab $vocab \
-w -4000 --quiet --devices 0 1 2 3 --mini-batch 16 --maxi-batch 1000
# install requirements; including mt-metrics-eval
pip install -r requirements.txt
# download metrics package
python -m mt_metrics_eval.mtme --download
# Flatten test data as TSV; this will be used in subsequent steps
# also this will help speedup evaluation
data_file=$(python -m evaluate -t wmt22 flatten --no-ref)
devices="0 1 2 3 4 5" # run on GPUs
n_segs=$(wc -l < $data_file)
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
out=${data_file%.tsv}.seg.score.$m;
[[ -f $out._OK ]] && continue;
rm -f $out; `# remove incomplete file, if any`
cut -f4,6 $data_file | pymarian-evaluate --stdin -d $devices -m $m | tqdm --desc=$m --total=$n_segs > $out && touch $out._OK;
done
# average seg scores to system scores; then evaluate
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
score_file=${data_file%.tsv}.seg.score.$m;
[[ -f $score_file._OK ]] || { echo "Error: $scores_file._OK not found"; continue;}
python -m evaluate -t wmt22 full --no-ref --name $m --scores $score_file;
done
This produces results.csv
$ cat results.csv | grep -E -i 'wmt22|cometoid|chrf|comet-22'
,wmt22.mqm_tab11,wmt22.da_sqm_tab8
*chrfoid-wmt23[noref],0.7773722627737226,0.8321299638989169
*cometoid22-wmt21[noref],0.7883211678832117,0.8483754512635379
*cometoid22-wmt22[noref],0.8065693430656934,0.8574007220216606
*cometoid22-wmt23[noref],0.8029197080291971,0.8592057761732852
COMET-22,0.8394160583941606,0.8393501805054152
MS-COMET-22,0.8284671532846716,0.8303249097472925
chrF,0.7335766423357665,0.7581227436823105
Please cite this paper: https://aclanthology.org/2023.wmt-1.62/
@inproceedings{gowda-etal-2023-cometoid,
title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics",
author = "Gowda, Thamme and
Kocmi, Tom and
Junczys-Dowmunt, Marcin",
editor = "Koehn, Philipp and
Haddon, Barry and
Kocmi, Tom and
Monz, Christof",
booktitle = "Proceedings of the Eighth Conference on Machine Translation",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.wmt-1.62",
pages = "751--755",
}