-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attempting to let meteor handle multiple references per prediction #164
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Thank you @lvwerra !
When I do:
The Do you have any ideas? do you think it's the tokenizer?... |
If you execute the first NLTK example yourself you get the same results? Or are they also different? |
Nope 😕
Whereas the result given by NLTK is This is weird, right? since we're using it under the hood? |
I just checked the NLTK example. It seems to be an issue with the version on their side: They switched from sentences to word tokens between with hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party']
reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', 'ensures', 'that', 'the', 'military', 'will', 'forever', 'heed', 'Party', 'commands']
reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', 'guarantees', 'the', 'military', 'forces', 'always', 'being', 'under', 'the', 'command', 'of', 'the', 'Party']
reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', 'army', 'always', 'to', 'heed', 'the', 'directions', 'of', 'the', 'party']
round(meteor_score([" ".join(reference1), " ".join(reference2), " ".join(reference3)], " ".join(hypothesis1)),4)
>>> 0.7298 in hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party']
reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', 'ensures', 'that', 'the', 'military', 'will', 'forever', 'heed', 'Party', 'commands']
reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', 'guarantees', 'the', 'military', 'forces', 'always', 'being', 'under', 'the', 'command', 'of', 'the', 'Party']
reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', 'army', 'always', 'to', 'heed', 'the', 'directions', 'of', 'the', 'party']
round(meteor_score([reference1, reference2, reference3], hypothesis1),4)
>>> 0.6944 So I would not worry about it in this PR but maybe open an issue on NLTK. |
Ok great! I'll open a PR with NLTK then 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! A few minor comments. In addition, I think we should also document this in the README, right?
metrics/meteor/meteor.py
Outdated
meteor_score.single_meteor_score(ref, pred, alpha=alpha, beta=beta, gamma=gamma) | ||
for ref, pred in zip(references, predictions) | ||
] | ||
if any(isinstance(el, list) for el in references): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. maybe you can also just check at the beginning once:
multiple_refs = isinstance(references[0], list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change (I think) -- is this what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also updated the README!
adding comment about NLTK version
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
checking first element
Updating README to reflect changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment about reusing multiple_refs
. LGTM 🚀
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
I'm not sure I'm doing the
word_tokenize()
correctly in line 27 -- @lvwerra can you please help?