-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: add BLEU #2535
metrics: add BLEU #2535
Conversation
Hello @ydcjeff! Thanks for updating this PR.
Comment last updated at 2020-07-20 15:03:11 UTC |
Hello @justusschock, |
Codecov Report
@@ Coverage Diff @@
## master #2535 +/- ##
======================================
Coverage 91% 91%
======================================
Files 70 72 +2
Lines 5778 5831 +53
======================================
+ Hits 5270 5323 +53
Misses 508 508 |
Since bleu is a metric specific to nlp, could you move your code to a file called
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another standard implementation so we can compare our results with theirs in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested some changes. It is basically all your math, that should use torch operations instead of math package ops.
Your current implementation should go under metrics/functional/sequence.py
Once we finished iterating over the functional interface, we also need to add a module interface.
@williamFalcon is there a way to directly calculate these on tensors? if you have to convert it back to strings first, we always have a GPU sync, which we want to avoid.
pytorch_lightning/metrics/bleu.py
Outdated
return bleu | ||
|
||
|
||
# t = "the FAST brown fox jumped over the lazy dog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove these lines?
I have refactored with torch.Tensor, added smooth argument and tested with nltk. And also I added nltk in test.txt for testing. |
def test_with_sentence_bleu(): | ||
nltk_output = sentence_bleu([reference1, reference2, reference3], hypothesis1, weights=(1, 0, 0, 0)) | ||
pl_output = bleu_score([hypothesis1], [[reference1, reference2, reference3]], n=1).item() | ||
assert round(pl_output, 4) == round(nltk_output, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather use torch.allclose(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls see my comments, but we are on a good way...
I thought that we are renaming this sequence module to |
so |
how shall I add? |
this should be: from p...metrics.nlp import Bleu |
Okay |
I guess this is ready to review/go. @williamFalcon @Borda |
ummm. so this is a wrapper on torchtext? i think we need our own implementation |
Ahhh, I have implemented from scratch before and referenced the implmentation a little from torchtext. Then, @justusschock sugguested to base on torchtext since it is in pytorch ecosystem. So I refactored with torchtext. |
@williamFalcon : @Borda and me agreed, that we don't need to duplicate this if it is already present within torchtext since basically everyone that will use this will also have torchtext installed and it was already an optional dependency. |
i guess the point of metrics here is to centralize all metrics. otherwise we coule have said the same about sklearn. we want our metrics package to be the reference implementation for any metric. So, i would say, implement it here from scratch and test against torchtext for performance? the reason is that we want to give our community the flexibility to modify it as best practices change and i know bleu is one of those hotly debated metrics in terms of implementation details |
With sklearn it's more about GPU performance/syncs :) But I see your point. Then I'm sorry @ydcjeff :D But your code should still be available here :) SO just copy/paste it in :D |
@williamFalcon I have re-implemented from scratch, anything you like to add on? |
It's ready to be reviewed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just some minor comments (mainly on typing)
@ydcjeff awesome!! |
What does this PR do?
Fixes #1301 (issue)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃