Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix norm nd grad #5306

Closed
wants to merge 4 commits into from
Closed

Fix norm nd grad #5306

wants to merge 4 commits into from

Conversation

zegnog
Copy link

@zegnog zegnog commented Jul 8, 2021

Fixes #5298 , #5300 .

Changes proposed in this pull request:

  • Change the behavior of gradient normalization so when input sequence is more than 1d it does not throw an error.
  • Take the abs value of embeding gradient per token before summing and normalizing.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.
  • codecov/patch reports high test coverage (at least 90%).
    You can find this under the "Actions" tab of the pull request once the other checks have finished.

@epwalsh
Copy link
Member

epwalsh commented Jul 19, 2021

@matt-gardner I'd love to get your input on this if you have time, specifically with regard to the changes on how the gradients are summed (#5298).

@@ -48,9 +46,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
# gradient and its respective embedding.
input_idx = int(key[-1]) - 1
# The [0] here is undo-ing the batching that happens in get_gradients.
emb_grad = numpy.sum(grad[0] * embeddings_list[input_idx][0], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is computing a dot product between the gradient vector and the embedding vector. The original implementation is correct. The proposed change is not a dot product anymore.

@@ -30,9 +29,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
# Normalize results
for key, grad in grads.items():
# The [0] here is undo-ing the batching that happens in get_gradients.
embedding_grad = numpy.sum(grad[0], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious, but this line is also part of a dot product. I don't remember why it's implemented this way, but it might be for consistency across the different interpreters, or there might be some efficiency considerations that I'm not remembering. If you look at line 116 you'll see the first half of the dot product computation.

@dirkgr dirkgr closed this Jul 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Correctness of summing up embeddings by the sequence dimension in the interpret package
4 participants