How to get intermediate image features like from Swin Transformers? #48

kashyappiyush1998 · 2023-04-23T15:11:06Z

When I pass 1024px size and get intermediate image features from Swin Transform I get in return image feature of sizes:

torch.Size([1, 128, 256, 256])
torch.Size([1, 128, 256, 256])
torch.Size([1, 256, 128, 128])
torch.Size([1, 512, 64, 64])
torch.Size([1, 1024, 32, 32])

How do I get something like this from dinov2?

ccharest93 · 2023-04-23T15:42:42Z

Swin transformer uses non overlapping attention windows with local attention, which is different from this model, this was done to combat the quadratic complexity of increasing the patch numbers resulting from smaller patches.

Using flash attention, this model can directly take the 1024 pixel input which somewhat addresses that issue (patch size is stuck at 14 but allows for higher resolution images).

Now if you want to have local attention within a bigger image, nothing stops you from cropping your image in 4, 9,16... non overlapping pieces and then feeding these into the network. This would result in local attention within these pieces.

woctezuma · 2023-04-23T16:59:36Z

Is this the right way to do inference? #2 (comment) get_intermediate_layers()

TimDarcet · 2023-04-24T13:24:10Z

Hi,
As noted by @ccharest93 , the model architecture is simply different, you won't get the same feature map shapes as in a Swin.

If you'd like different resolutions of feature maps (eg to input to a decoder such as upernet), you can downsample high-res feature maps with avg pooling (the general idea in https://arxiv.org/abs/2203.16527)

TimDarcet added the documentation Improvements or additions to documentation label Apr 24, 2023

TimDarcet self-assigned this Apr 24, 2023

TimDarcet closed this as completed Apr 24, 2023

patricklabatut mentioned this issue Apr 24, 2023

[request] Feature extraction documentation #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get intermediate image features like from Swin Transformers? #48

How to get intermediate image features like from Swin Transformers? #48

kashyappiyush1998 commented Apr 23, 2023

ccharest93 commented Apr 23, 2023

woctezuma commented Apr 23, 2023 •

edited

Loading

TimDarcet commented Apr 24, 2023

How to get intermediate image features like from Swin Transformers? #48

How to get intermediate image features like from Swin Transformers? #48

Comments

kashyappiyush1998 commented Apr 23, 2023

ccharest93 commented Apr 23, 2023

woctezuma commented Apr 23, 2023 • edited Loading

TimDarcet commented Apr 24, 2023

woctezuma commented Apr 23, 2023 •

edited

Loading