Skip to content

Fine-Tuning CNN models #3182

Answered by andsteing
IMvision12 asked this question in General
Jul 5, 2023 · 1 comments · 3 replies
Discussion options

You must be logged in to vote

Answer copied from google-research/vision_transformer#274:

When you want to fine-tune a ViT model with a different image size than it was pre-trained on, then you'll need to adjust the position embeddings accordingly. Section 3.2 of the ViT Paper proposes to perform 2D interpolation.

This is supported in this codebase when loading a checkpoint:

https://github.com/google-research/vision_transformer/blob/297866ab49341257e6f657d7f1068164c8eaf338/vit_jax/checkpoint.py#L192-L201

Which is done automatically when you call checkpoint.load_pretrained() and provide both init_params that expect a certain image size (e.g. 128 in your example), and load from a checkpoint that has weights that were tr…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@IMvision12
Comment options

@cgarciae
Comment options

@andsteing
Comment options

Answer selected by IMvision12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants