Training CLIP-ViT #58

Meteorix · 2021-03-10T11:47:09Z

@jongwook Thanks for this great work!

I am trying to train CLIP VIT B/32 from scratch, but cannot get a higher score on imagenet versus CLIP resnet-50. May I ask what initialization you use in training VIT?

In the paper: We closely follow their implementation with only the minor modification of adding an additional layer normalization to the combined patch and position embeddings before the transformer and use a slightly different initialization scheme.

The text was updated successfully, but these errors were encountered:

yuxulingche · 2021-03-22T12:36:26Z

Did you write the training code yourself？

Meteorix · 2021-03-24T07:48:49Z

Did you write the training code yourself？

yes, it's not hard.

jongwook · 2021-03-24T10:46:54Z

Are you comparing ViT-based and ResNet50-based models trained with the same dataset? Vision transformers tend to underperform ResNet-based models unless they're trained on a huge dataset, so I'd suspect that could have been the reason, rather than the initialization scheme.

Meteorix · 2021-03-24T11:43:01Z

Are you comparing ViT-based and ResNet50-based models trained with the same dataset? Vision transformers tend to underperform ResNet-based models unless they're trained on a huge dataset, so I'd suspect that could have been the reason, rather than the initialization scheme.

Thanks for your reply. We have trained on 100M text-image pairs. It turned out that ViTB32 model outperformed ResNet50 model on some of benchmarks.

ValerioB88 · 2021-03-25T05:11:27Z

@Meteorix what dataset did you use?

dragen1860 · 2021-03-26T07:47:42Z

@jongwook Thanks for this great work!

I am trying to train CLIP VIT B/32 from scratch, but cannot get a higher score on imagenet versus CLIP resnet-50. May I ask what initialization you use in training VIT?

In the paper: We closely follow their implementation with only the minor modification of adding an additional layer normalization to the combined patch and position embeddings before the transformer and use a slightly different initialization scheme.

Hi, Meteorix:
I am also working on this exciting project. I found you have implemented your own training code. That's pretty cool. May i ask for sharing your training code? it's would be good if you can kindly send the code to me. liangqu.long@gmail.com . Thank you very much.

jongwook · 2021-04-08T09:25:52Z

@dragen1860 see #83 and also other third-party implementations by @KeremTurgutlu and @lucidrains.

JACKHAHA363 · 2021-09-08T17:10:54Z

Hi, I am also running ViT-B/32 and with the open-clip code on CC3M. I also notice a discrepancy between ResNet and ViT. You can see my curves at mlfoundations/open_clip#14. I am interested to know if you solve this performance gap @jongwook @Meteorix

Shelton-Zhou · 2021-11-03T07:36:21Z

@jongwook Thanks for this great work!

I am trying to train CLIP VIT B/32 from scratch, but cannot get a higher score on imagenet versus CLIP resnet-50. May I ask what initialization you use in training VIT?

In the paper: We closely follow their implementation with only the minor modification of adding an additional layer normalization to the combined patch and position embeddings before the transformer and use a slightly different initialization scheme.

Hi, Meteorix:
I am also working on this exciting project. I found you have implemented your own training code. May i ask for sharing your training code? it's would be good if you can kindly send the code to me at x888632157@163.com . Thanks a lot!

Meteorix changed the title ~~Vit~~ Training CLIP-ViT Mar 10, 2021

jongwook closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training CLIP-ViT #58

Training CLIP-ViT #58

Meteorix commented Mar 10, 2021

yuxulingche commented Mar 22, 2021

Meteorix commented Mar 24, 2021

jongwook commented Mar 24, 2021

Meteorix commented Mar 24, 2021

ValerioB88 commented Mar 25, 2021

dragen1860 commented Mar 26, 2021

jongwook commented Apr 8, 2021

JACKHAHA363 commented Sep 8, 2021

Shelton-Zhou commented Nov 3, 2021 •

edited

Loading

Training CLIP-ViT #58

Training CLIP-ViT #58

Comments

Meteorix commented Mar 10, 2021

yuxulingche commented Mar 22, 2021

Meteorix commented Mar 24, 2021

jongwook commented Mar 24, 2021

Meteorix commented Mar 24, 2021

ValerioB88 commented Mar 25, 2021

dragen1860 commented Mar 26, 2021

jongwook commented Apr 8, 2021

JACKHAHA363 commented Sep 8, 2021

Shelton-Zhou commented Nov 3, 2021 • edited Loading

Shelton-Zhou commented Nov 3, 2021 •

edited

Loading