-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TER above 100? #208
Comments
What dataset is this? |
Chemical patent data – not publicly available! Does this matter? |
Well, the best way to diagnose this is if I can run your exact command. Can you share the invocation you're using, and maybe some sample data? How many references are you using? |
@JoyeBright you would need to show a reproducible example that demonstrates how to get a TER score above 100. Here is some advice: https://stackoverflow.com/help/minimal-reproducible-example Either include code here or perhaps a link to a Colab. Make sure to mention the exact version of SacreBLEU or include a pip install command. |
@mjpost, @bricksdont Below is the link to the Colab; you can see three examples that achieved TER above 100! https://colab.research.google.com/drive/13K16f9znwH_xYhVg9RoT0I0UiJADUuTA?usp=sharing Any idea? Thank you! |
Thank you @JoyeBright, This is a score scaling issue. If you install an earlier version the resulting scores are the same, except a 100 times smaller. Example: from sacrebleu import TER
from argparse import Namespace
args = Namespace(
normalized=False, no_punct=False,
asian_support=False, case_sensitive=False)
ter = TER(args)
sentences_2 = "Inoltre, a meno che altrimenti indicato, i disegni non sono in scala."
ref_2 = "Tabella 16:"
# sacrebleu==2.2.1 (newest)
print(ter.sentence_score(sentences_1, [ref_1]).score)
600.0
# sacrebleu==1.5.0
print(ter.sentence_score(sentences_1, [ref_1]).score)
6.0 as a quick workaround you can just divide scores by 100. But still this is an issue that should be fixed. |
Dear @bricksdont, thanks for spotting the problem! |
@JoyeBright maybe leave this issue open, I think it needs to be addressed |
I am reopening this issue as it seems quite serious problem if all TER scores are reported 100 times higher then they should be. |
It looks like this was added then for the 2.0 release: https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/metrics/ter.py#L129 I can revert that and do a 2.3.2 release. |
Wait. It seems I was wrong - and the current implementation is OK. I don't have enough time now to double check everything, but: In the original TER paper, TER is defined as However, if the prediction is longer than the reference and does not share any words with the reference, we need to first delete all the words from the prediction and then add the words from the reference, so the It seems there have been no attempts to clip the scores at 100%. For example the original Java implementation (java -jar tercom.7.25.jar), seems to report mostly numbers between 0 and 1, but values higher than 1 are possible as well. In SacreBLEU v2.0, we have decided to report TER as percentages, i.e. the formula |
I was just looking into this myself and came to the same conclusion. See also #169 where @ozancaglayan notes this same phenomenon (also #140 and the release notes in #152). I am going to close this again—it seems the implementation is correct. |
Dear all,
I am getting TER scores above 100 for some MT-Ref pairs. Shouldn't the scores be given between 0 and 100? Is any new update included in the sacreblue that modifies the range? Any idea on that?
Examples:
The text was updated successfully, but these errors were encountered: