You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a project that needs longformer. However, my attention_mask is a size of [batch, seq_len, seq_len], it is not a size of [batch, seq_len] as usual. I am really confused and do not know how to tackle it when I see these line of code:
I know when the computation step comes to SelfAttention, the attention_mask which has the size of [batch, seq_len] is extended as [: , None, None, :], but my [batch, seq_len, seq_len] attention_mask does not make sense to do squeeze() like this. I also read the source code of longformer in Transformers HuggingFace and run the code with my attention_mask, it also bring out error because of attention_mask's dimension from these lines of code: transformers/models/longformer/modeling_longformer.py#L587-L597
# values to pad for attention probsremove_from_windowed_attention_mask= (attention_mask!=0)[:, :, None, None]
# cast to fp32/fp16 then replace 1's with -inffloat_mask=remove_from_windowed_attention_mask.type_as(query_vectors).masked_fill(
remove_from_windowed_attention_mask, -10000.0
)
# diagonal mask with zeros everywhere and -inf inplace of paddingdiagonal_mask=self._sliding_chunks_query_key_matmul(
float_mask.new_ones(size=float_mask.size()), float_mask, self.one_sided_attn_window_size
)
If I use my attention_mask, the remove_from_windowed_attention_mask will be the size of [batch, seq_len, 1, 1, seq_len] and the ValueError: too many values to unpack (expected 4) appears when executing these lines of code: transformers/models/longformer/modeling_longformer.py#L802-L808
def_sliding_chunks_query_key_matmul(self, query: torch.Tensor, key: torch.Tensor, window_overlap: int):
""" Matrix multiplication of query and key tensors using with a sliding window attention pattern. This implementation splits the input into overlapping chunks of size 2w (e.g. 512 for pretrained Longformer) with an overlap of size window_overlap """batch_size, seq_len, num_heads, head_dim=query.size()
In short, at 2 source codes of LongformerSelfAttention, I always get into trouble because of my 3-dimensionalattention_mask. I would be grateful if you could help.
Thanks,
Khang
The text was updated successfully, but these errors were encountered:
Hi,
I am building a project that needs
longformer
. However, myattention_mask
is a size of[batch, seq_len, seq_len]
, it is not a size of[batch, seq_len]
as usual. I am really confused and do not know how to tackle it when I see these line of code:longformer/longformer/longformer.py
Lines 80 to 91 in 265314d
I know when the computation step comes to
SelfAttention
, theattention_mask
which has the size of[batch, seq_len]
is extended as[: , None, None, :]
, but my[batch, seq_len, seq_len]
attention_mask does not make sense to dosqueeze()
like this. I also read the source code oflongformer
in Transformers HuggingFace and run the code with my attention_mask, it also bring out error because of attention_mask's dimension from these lines of code:transformers/models/longformer/modeling_longformer.py#L587-L597
If I use my attention_mask, the
remove_from_windowed_attention_mask
will be the size of[batch, seq_len, 1, 1, seq_len]
and theValueError: too many values to unpack (expected 4)
appears when executing these lines of code:transformers/models/longformer/modeling_longformer.py#L802-L808
In short, at 2 source codes of
LongformerSelfAttention
, I always get into trouble because of my 3-dimensionalattention_mask
. I would be grateful if you could help.Thanks,
Khang
The text was updated successfully, but these errors were encountered: