You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently when changing ViT img size from a rectangular size, resample_abs_pos_embed() does not work correctly since it does not know the original rectangular size and assume a square.
I understand it is not so straight-forward to support this, since once the model is created (with updated image size), the original image size is lost. Some hacks can probably bypass this, but not so nice
Propagate the original image size to the _load_weights() function
Create a model with the original image size, load weights as usual. Add a new method like .set_img_size() which will update the internal img_size attribute and resamle pos embed.
Perhaps an easier solution is to fix dynamic img size to pass the original img size (which I tested locally and works)
@gau-nernst I have thought about this, and yeah it's not done because to do it at weight load time where it's done now, it's more complexity than I'd like, especially for the benefit/demand.
@gau-nernst on PR #2225 there is first pass at implementing a set_input_size fn... currently should mostly work for models from vision_transformer.py, vision_transformer_hybrid.py, and swin_transformer.py (v1)
Is your feature request related to a problem? Please describe.
Currently when changing ViT img size from a rectangular size,
resample_abs_pos_embed()
does not work correctly since it does not know the original rectangular size and assume a square.pytorch-image-models/timm/models/vision_transformer.py
Lines 1096 to 1103 in 5dce710
pytorch-image-models/timm/layers/pos_embed.py
Lines 32 to 34 in 5dce710
Describe the solution you'd like
It should work out of the box.
Describe alternatives you've considered
Manually resize it.
Additional context
Apparently dynamic img size also will not work when original img size is rectangle.
pytorch-image-models/timm/models/vision_transformer.py
Lines 603 to 609 in 5dce710
This is a rare problem since most image ViT use square inputs. The particular model I'm using is my previously ported AudioMAE (https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m), which uses rectangular input (mel-spectrogram).
I understand it is not so straight-forward to support this, since once the model is created (with updated image size), the original image size is lost. Some hacks can probably bypass this, but not so nice
_load_weights()
function.set_img_size()
which will update the internalimg_size
attribute and resamle pos embed.Perhaps an easier solution is to fix dynamic img size to pass the original img size (which I tested locally and works)
The text was updated successfully, but these errors were encountered: