Skip to content

Commit

Permalink
add content for attention maps
Browse files Browse the repository at this point in the history
  • Loading branch information
lisa-sousa committed Aug 28, 2024
1 parent 933ed5a commit f322836
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/source/_model_specific_xai/am.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Attention maps for Sequence Transformers are usually generated through a self-at
where each element represents the attention weight between two tokens in the one sequence ot in relation to another sequence.

Attention Maps from Self-Attention
===================================
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The attention weight are calculated within a single sequence to model the interactions between different elements of that sequence. The process involves queries, keys, and values, all of which are derived from the same input data.
Here's a step-by-step explanation of how these attention weight are calculated:
Expand All @@ -48,7 +48,7 @@ Here's a step-by-step explanation of how these attention weight are calculated:
Here, :math:`\alpha_{ij}` represents the attention attention weight from the :math:`i`-th query to the :math:`j`-th key.

Attention Maps from Cross-Attention
===================================
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

An alternative approach is to generate attention maps through the cross-attention mechanism.
In cross-attention the attention is computed between two different sequences, allowing the model to focus on how elements in one sequence relate to elements in another, e.g. aligning words in one language with words in another language.
Expand All @@ -67,6 +67,8 @@ The attention weights matrix :math:`A` obtained from these steps is used to crea
Each matrix element, :math:`A_{ij}`, represents the attention weight from the :math:`i`-th element of the query sequence to the :math:`j`-th element of the key sequence.
These maps visually demonstrate how the model attends to different parts of one sequence (represented by keys) in relation to each part of another sequence (represented by queries), thus providing insight into the model's learning and decision-making process.

Attention Maps for text
--------------------------

References
------------
Expand Down

0 comments on commit f322836

Please sign in to comment.