Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix norm nd grad #5306

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Changed behavior of `MultiOptimizer` so that while a default optimizer is still required, an error is not thrown if the default optimizer receives no parameters.
- Made the epsilon parameter for the layer normalization in token embeddings configurable.
- Change the behavior of gradient normalization so when input sequence is more than 1d it does not throw an error.
- Take the abs value of embeding gradient per token before summing and normalizing.

### Removed

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import math
from typing import List, Dict, Any

import numpy
Expand Down Expand Up @@ -30,9 +29,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
# Normalize results
for key, grad in grads.items():
# The [0] here is undo-ing the batching that happens in get_gradients.
embedding_grad = numpy.sum(grad[0], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious, but this line is also part of a dot product. I don't remember why it's implemented this way, but it might be for consistency across the different interpreters, or there might be some efficiency considerations that I'm not remembering. If you look at line 116 you'll see the first half of the dot product computation.

norm = numpy.linalg.norm(embedding_grad, ord=1)
normalized_grad = [math.fabs(e) / norm for e in embedding_grad]
embedding_grad = numpy.sum(numpy.abs(grad[0]), axis=-1)
norm = numpy.linalg.norm(embedding_grad, ord=1, keepdims=True)
normalized_grad = embedding_grad / norm
grads[key] = normalized_grad

instances_with_grads["instance_" + str(idx + 1)] = grads
Expand Down
8 changes: 3 additions & 5 deletions allennlp/interpret/saliency_interpreters/simple_gradient.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
import math

from typing import List
import numpy
import torch
Expand Down Expand Up @@ -48,9 +46,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
# gradient and its respective embedding.
input_idx = int(key[-1]) - 1
# The [0] here is undo-ing the batching that happens in get_gradients.
emb_grad = numpy.sum(grad[0] * embeddings_list[input_idx][0], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is computing a dot product between the gradient vector and the embedding vector. The original implementation is correct. The proposed change is not a dot product anymore.

norm = numpy.linalg.norm(emb_grad, ord=1)
normalized_grad = [math.fabs(e) / norm for e in emb_grad]
emb_grad = numpy.sum(numpy.abs(grad[0] * embeddings_list[input_idx][0]), axis=-1)
norm = numpy.linalg.norm(emb_grad, ord=1, keepdims=True)
normalized_grad = emb_grad / norm
grads[key] = normalized_grad

instances_with_grads["instance_" + str(idx + 1)] = grads
Expand Down
7 changes: 3 additions & 4 deletions allennlp/interpret/saliency_interpreters/smooth_gradient.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import math
from typing import Dict, Any

import numpy
Expand Down Expand Up @@ -39,9 +38,9 @@ def saliency_interpret_from_json(self, inputs: JsonDict) -> JsonDict:
# Fine for now, but should fix for consistency.

# The [0] here is undo-ing the batching that happens in get_gradients.
embedding_grad = numpy.sum(grad[0], axis=1)
norm = numpy.linalg.norm(embedding_grad, ord=1)
normalized_grad = [math.fabs(e) / norm for e in embedding_grad]
embedding_grad = numpy.sum(numpy.abs(grad[0]), axis=-1)
norm = numpy.linalg.norm(embedding_grad, ord=1, keepdims=True)
normalized_grad = embedding_grad / norm
grads[key] = normalized_grad

instances_with_grads["instance_" + str(idx + 1)] = grads
Expand Down