metrics: add BLEU #2535

ydcjeff · 2020-07-07T07:18:28Z

What does this PR do?

Fixes #1301 (issue)

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-07-07T07:18:31Z

Hello @ydcjeff! Thanks for updating this PR.

In the file pytorch_lightning/metrics/functional/nlp.py:

Line 27:49: E203 whitespace before ':'

In the file tests/metrics/test_nlp.py:

Line 22:98: E231 missing whitespace after ','

Comment last updated at 2020-07-20 15:03:11 UTC

ydcjeff · 2020-07-07T07:25:10Z

Hello @justusschock,
I have opened the PR. Also, where should I put my code under metrics package as I wrote it in a separate file?

codecov · 2020-07-07T07:27:43Z

Codecov Report

Merging #2535 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #2535   +/-   ##
======================================
  Coverage      91%     91%           
======================================
  Files          70      72    +2     
  Lines        5778    5831   +53     
======================================
+ Hits         5270    5323   +53     
  Misses        508     508

SkafteNicki · 2020-07-07T07:58:07Z

Since bleu is a metric specific to nlp, could you move your code to a file called nlp.py. I guess we down the line will get more field-specific metrics.
Additionally,

add link to metric in this file https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/metrics/__init__.py
add reference in the docs, in this file https://github.com/PyTorchLightning/pytorch-lightning/blob/master/docs/source/metrics.rst

Borda

is there another standard implementation so we can compare our results with theirs in tests?

justusschock

I requested some changes. It is basically all your math, that should use torch operations instead of math package ops.

Your current implementation should go under metrics/functional/sequence.py

Once we finished iterating over the functional interface, we also need to add a module interface.

@williamFalcon is there a way to directly calculate these on tensors? if you have to convert it back to strings first, we always have a GPU sync, which we want to avoid.

justusschock · 2020-07-07T09:04:00Z

pytorch_lightning/metrics/bleu.py

+    return bleu
+
+
+# t = "the FAST brown fox jumped over the lazy dog"


can you remove these lines?

pytorch_lightning/metrics/bleu.py

ydcjeff · 2020-07-07T19:51:58Z

I have refactored with torch.Tensor, added smooth argument and tested with nltk.

And also I added nltk in test.txt for testing.

requirements/test.txt

Borda · 2020-07-07T20:10:32Z

tests/metrics/functional/test_sequence.py

+def test_with_sentence_bleu():
+    nltk_output = sentence_bleu([reference1, reference2, reference3], hypothesis1, weights=(1, 0, 0, 0))
+    pl_output = bleu_score([hypothesis1], [[reference1, reference2, reference3]], n=1).item()
+    assert round(pl_output, 4) == round(nltk_output, 4)


rather use torch.allclose(...)

tests/metrics/functional/test_sequence.py

Borda

pls see my comments, but we are on a good way...

pytorch_lightning/metrics/functional/sequence.py

Borda · 2020-07-16T17:24:12Z

this circleci test failed as there is only installing from base.txt and test.txt while torchtext is in extra.txt

I thought that we are renaming this sequence module to nlp? @justusschock
you shall add this module do ignore for Conda and CIrcleCI tests :]

ydcjeff · 2020-07-16T17:30:19Z

I thought that we are renaming this sequence module to nlp? @justusschock

so nlp or sequence?

ydcjeff · 2020-07-16T17:31:13Z

you shall add this module do ignore for Conda and CIrcleCI tests :]

how shall I add?

williamFalcon · 2020-07-17T10:27:54Z

this should be:

from p...metrics.nlp import Bleu

ydcjeff · 2020-07-17T10:29:32Z

this should be:

from p...metrics.nlp import Bleu

Okay

ydcjeff · 2020-07-17T11:11:53Z

I guess this is ready to review/go. @williamFalcon @Borda

williamFalcon · 2020-07-17T11:21:42Z

ummm. so this is a wrapper on torchtext? i think we need our own implementation

ydcjeff · 2020-07-17T11:43:27Z

Ahhh, I have implemented from scratch before and referenced the implmentation a little from torchtext.

Then, @justusschock sugguested to base on torchtext since it is in pytorch ecosystem.

So I refactored with torchtext.
@williamFalcon

justusschock · 2020-07-17T11:56:32Z

@williamFalcon : @Borda and me agreed, that we don't need to duplicate this if it is already present within torchtext since basically everyone that will use this will also have torchtext installed and it was already an optional dependency.

williamFalcon · 2020-07-17T12:03:04Z

i guess the point of metrics here is to centralize all metrics. otherwise we coule have said the same about sklearn.

we want our metrics package to be the reference implementation for any metric.

So, i would say, implement it here from scratch and test against torchtext for performance? the reason is that we want to give our community the flexibility to modify it as best practices change and i know bleu is one of those hotly debated metrics in terms of implementation details

justusschock · 2020-07-17T14:45:50Z

With sklearn it's more about GPU performance/syncs :)

But I see your point. Then I'm sorry @ydcjeff :D But your code should still be available here :) SO just copy/paste it in :D

ydcjeff · 2020-07-17T15:26:07Z

@williamFalcon I have re-implemented from scratch, anything you like to add on?

pytorch_lightning/metrics/functional/nlp.py

.circleci/config.yml

ydcjeff · 2020-07-20T11:39:55Z

It's ready to be reviewed

justusschock

LGTM.

Just some minor comments (mainly on typing)

pytorch_lightning/metrics/functional/nlp.py

pytorch_lightning/metrics/nlp.py

williamFalcon · 2020-07-22T13:58:37Z

@ydcjeff awesome!!

mergify bot requested a review from a team July 7, 2020 07:19

Borda added the feature Is an improvement or enhancement label Jul 7, 2020

Borda added this to the 0.8.x milestone Jul 7, 2020