Fix norm nd grad #5306

zegnog · 2021-07-08T22:24:57Z

Fixes #5298 , #5300 .

Changes proposed in this pull request:

Change the behavior of gradient normalization so when input sequence is more than 1d it does not throw an error.
Take the abs value of embeding gradient per token before summing and normalizing.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.
codecov/patch reports high test coverage (at least 90%).
You can find this under the "Actions" tab of the pull request once the other checks have finished.

epwalsh · 2021-07-19T21:24:32Z

@matt-gardner I'd love to get your input on this if you have time, specifically with regard to the changes on how the gradients are summed (#5298).

matt-gardner · 2021-07-26T15:44:04Z

allennlp/interpret/saliency_interpreters/simple_gradient.py

@@ -48,9 +46,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
                # gradient and its respective embedding.
                input_idx = int(key[-1]) - 1
                # The [0] here is undo-ing the batching that happens in get_gradients.
-                emb_grad = numpy.sum(grad[0] * embeddings_list[input_idx][0], axis=1)


This line is computing a dot product between the gradient vector and the embedding vector. The original implementation is correct. The proposed change is not a dot product anymore.

matt-gardner · 2021-07-26T15:45:22Z

allennlp/interpret/saliency_interpreters/integrated_gradient.py

@@ -30,9 +29,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
            # Normalize results
            for key, grad in grads.items():
                # The [0] here is undo-ing the batching that happens in get_gradients.
-                embedding_grad = numpy.sum(grad[0], axis=1)


It's not obvious, but this line is also part of a dot product. I don't remember why it's implemented this way, but it might be for consistency across the different interpreters, or there might be some efficiency considerations that I'm not remembering. If you look at line 116 you'll see the first half of the dot product computation.

zegnog added 3 commits July 9, 2021 00:21

fix the error in interpreter when input is nd seq

f53592d

[interpret] abs the gradient first before summing

889297d

update changelog

5194f8f

Merge branch 'main' into fix-norm-nd-grad

265a271

matt-gardner suggested changes Jul 26, 2021

View reviewed changes

dirkgr closed this Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix norm nd grad #5306

Fix norm nd grad #5306

zegnog commented Jul 8, 2021 •

edited

Loading

epwalsh commented Jul 19, 2021

matt-gardner Jul 26, 2021

matt-gardner Jul 26, 2021

Fix norm nd grad #5306

Fix norm nd grad #5306

Conversation

zegnog commented Jul 8, 2021 • edited Loading

Before submitting

After submitting

epwalsh commented Jul 19, 2021

matt-gardner Jul 26, 2021

Choose a reason for hiding this comment

matt-gardner Jul 26, 2021

Choose a reason for hiding this comment

zegnog commented Jul 8, 2021 •

edited

Loading