Missing 'blocks.11.attn.softmax' vision tranformer node #2141

scabini · 2024-04-09T14:46:46Z

scabini
Apr 9, 2024

I used to get attention scores from different vision transformer blocks using f'blocks.{str(i)}.attn.softmax' (where i is something from 0 to the num of layers of the model -1). Now I updated my timm and can't find this layer in the ViTs anymore. If I run get_graph_node_names these are the only nodes I get from the attention module:

'backbone.blocks.11.norm1',
'backbone.blocks.11.attn.qkv',
'backbone.blocks.11.attn.q_norm',
'backbone.blocks.11.attn.k_norm',
'backbone.blocks.11.attn.proj',
'backbone.blocks.11.attn.proj_drop',
'backbone.blocks.11.ls1',
...

Which, unfortunately, gives only pre-self-attention activations or the last projections. I can't figure out which version I was using before, but it would be nice to be able to get this specific softmax activation in newer versions as well

Answered by rwightman

Apr 9, 2024

@scabini F.sdpa makes it inaccessible, export TIMM_FUSED_ATTN=0 in your environment or timm.layers.set_fused_attn(False) in your program before creating the model

View full answer

scabini · 2024-04-09T17:10:42Z

scabini
Apr 9, 2024
Author

My temporary solution:

def wrapper_for_attnscores(attn_block):
    attn_block.softmax = torch.nn.Softmax(dim=-1)
    def my_forward(x):
        B, N, C = x.shape
        qkv = attn_block.qkv(x).reshape(B, N, 3, attn_block.num_heads, attn_block.head_dim).permute(2, 0, 3, 1, 4)
        q, k, v = qkv.unbind(0)
        q, k = attn_block.q_norm(q), attn_block.k_norm(k)

        q = q * attn_block.scale
        attn = q @ k.transpose(-2, -1)
        attn = attn_block.softmax(attn)
        attn = attn_block.attn_drop(attn)
        x = attn @ v

        x = x.transpose(1, 2).reshape(B, N, C)
        x = attn_block.proj(x)
        x = attn_block.proj_drop(x)
        return x    
    return my_forward
    
for layer in range(len(model.blocks)):
    model.blocks[layer].attn.forward = wrapper_for_attnscores(model.blocks[layer].attn)

Then

layer = 11
attn_scores = model.blocks[layer].attn.softmax

2 replies

rwightman Apr 9, 2024
Maintainer

@scabini F.sdpa makes it inaccessible, export TIMM_FUSED_ATTN=0 in your environment or timm.layers.set_fused_attn(False) in your program before creating the model

Answer selected by scabini

scabini Apr 10, 2024
Author

Thanks, it works!!
I can access it now directly from the model using blocks.11.attn.softmax, for any layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing 'blocks.11.attn.softmax' vision tranformer node #2141

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Missing 'blocks.11.attn.softmax' vision tranformer node #2141

scabini Apr 9, 2024

Replies: 1 comment · 2 replies

scabini Apr 9, 2024 Author

rwightman Apr 9, 2024 Maintainer

scabini Apr 10, 2024 Author

scabini
Apr 9, 2024

Replies: 1 comment 2 replies

scabini
Apr 9, 2024
Author

rwightman Apr 9, 2024
Maintainer

scabini Apr 10, 2024
Author