Quantifying Attention Flow in Transformers

Samira Abnar et al., University of Amsterdam - Arxiv 2020

Summarized by Seungwook Kim

Task:

Better diagnostic tool for visualization and debugging of transformers (vs. raw attention weights)

Motivation:

Information originating from different tokens gets increasingly mixed

In higher layers the attention weights are rather uniform
Raw attention weights are unreliable as explanation probes.
Attention weights do not necessarily correspond to the relative importance of input tokens.

Method:

Proposes 2 alternatives
View attention graph as DAG

1) Attention rollout
Assumption: identities of input tokens are linearly combined through the layers based on attention weights
-> Compute how much of the information is propagated through a particular path by multiplying the weights of all edges in that path
-> Sum over all possible paths between two nodes.

2) Attention flow
Motivated by graph theory, Max flow
-> Treats attention graph as flow network
---> Capacities of the edges== attention weights
-> Use max flow algorithm to compute the maximum attention flow from any node in any of the layers to any of the input layers
-> The weight of a single path is the minimum value of the weights of the edges in the path instead of product of weights

Analysis:
Attention rollout/flow has better interpretability of the results of the model
Attention rollout/flow has higher SpearmanR correlation of attention based importance with input gradients

총평: 5-6페이지짜리 페이퍼라 어디에 제출된 것 같진 않으며, 아이디어가 심플하고 visual하게 잘 보여줍니다. 하지만 짧은만큼 ablation이나 결과가 충분하지 않으며, method가 아주 세세하게 설명되어있진 않아서 완벽하게 이해하기는 어려웠습니다. 코드가 공개되어있다고 하니 transformer model를 사용할 때 요긴하게 사용할 수 있을 것 같긴 합니다. 다만 여러 simplifying assumption이 들어가있어서 정확한 interpretation이라고 보기는 어려우며, 이 부분을 약점을 내세우며 제가 앞전의 소개드린 논문이 전개되는 것으로 보입니다.

장점으로는 task/architecture agnostic하다는 점이 있습니다. high-dimension에서 transformer를 사용할때도 interpretation을 위해 사용해도 될 것으로 보입니다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60.md

60.md

Quantifying Attention Flow in Transformers

Samira Abnar et al., University of Amsterdam - Arxiv 2020

Summarized by Seungwook Kim

Files

60.md

Latest commit

History

60.md

File metadata and controls

Quantifying Attention Flow in Transformers

Samira Abnar et al., University of Amsterdam - Arxiv 2020

Summarized by Seungwook Kim