How to train CLIP from scratch？ #133

newbietuan · 2022-07-26T07:57:22Z

newbietuan
Jul 26, 2022

Hi, Thanks for the great work.
Due to the needs of specific tasks, i want to train CLIP from scratch without using BPE coding and the length limit of 77, how should i do?

rom1504 · 2022-07-26T08:18:16Z

rom1504
Jul 26, 2022
Maintainer

Are you asking how to train a new tokenizer ? If so, you might like https://huggingface.co/docs/tokenizers/training_from_memory

…

On Tue, Jul 26, 2022, 09:57 newbietuan ***@***.***> wrote: Hi, Thanks for the great work. Due to the needs of specific tasks, i want to train CLIP from scratch without using BPE coding and the length limit of 77, how should i do? — Reply to this email directly, view it on GitHub <#131>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437Q3T5AHTINLTJNUBCDVV6K65ANCNFSM54U5YNZA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

newbietuan · 2022-07-26T08:31:55Z

newbietuan
Jul 26, 2022
Author

Sorry I wasn't clear enough.
I want to train CLIP from scratch using my own data, rather than fine-tuning, to thus embed into my task.
Hope to use my original dataset, dataloader, tokenizer, etc. to train CLIP from scratch and use it for image-to-text retrieval.
It seems i should use the code https://github.com/openai/CLIP/blob/main/clip/model.py, while i am not very clear about the initialization and training process of the CLIP model, and it involves the loading of the pre-trained model ResNet/Vit.

0 replies

rom1504 · 2022-07-26T08:45:32Z

rom1504
Jul 26, 2022
Maintainer

All the instructions in the readme are for training from scratch. In fact fine tuning is not really supported.

…

On Tue, Jul 26, 2022, 10:32 newbietuan ***@***.***> wrote: Sorry I wasn't clear enough. I want to train CLIP from scratch using my own data, rather than fine-tuning, to thus embed into my task. Hope to use my original dataset, dataloader, tokenizer, etc. to train CLIP from scratch and use it for image-to-text retrieval. It seems i should use the code https://github.com/openai/CLIP/blob/main/clip/model.py, while i am not very clear about the initialization and training process of the CLIP model, and it involves the loading of the pre-trained model ResNet/Vit. — Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437WCWOXRAG4R33D2SUTVV6PANANCNFSM54U5YNZA> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

newbietuan · 2022-07-26T10:10:06Z

newbietuan
Jul 26, 2022
Author

All the instructions in the readme are for training from scratch. In fact fine tuning is not really supported.
…
On Tue, Jul 26, 2022, 10:32 newbietuan @.> wrote: Sorry I wasn't clear enough. I want to train CLIP from scratch using my own data, rather than fine-tuning, to thus embed into my task. Hope to use my original dataset, dataloader, tokenizer, etc. to train CLIP from scratch and use it for image-to-text retrieval. It seems i should use the code https://github.com/openai/CLIP/blob/main/clip/model.py, while i am not very clear about the initialization and training process of the CLIP model, and it involves the loading of the pre-trained model ResNet/Vit. — Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437WCWOXRAG4R33D2SUTVV6PANANCNFSM54U5YNZA . You are receiving this because you commented.Message ID: @.>

Thank you very much.
My project already contains code for text cleaning, image preprocessing (transforms.Compose), etc. I want to use the existing code directly and train in the usual way:
for text, image in dataloader:
image_features, text_features, logit_scale = clip_model(images, texts)
total_loss = loss(image_features, text_features, logit_scale)
total_loss.backward()
optimizer.step()
if i refer to the code and the referenced code in src/training/train.py is enough?

0 replies

rom1504 · 2022-07-26T10:23:31Z

rom1504
Jul 26, 2022
Maintainer

Yes if you run https://github.com/mlfoundations/open_clip#sample-single-process-running-code for 6000 GPUs hours on 400m samples for 32 epochs you should get good results

0 replies

newbietuan · 2022-07-26T10:28:42Z

newbietuan
Jul 26, 2022
Author

Yes if you run https://github.com/mlfoundations/open_clip#sample-single-process-running-code for 6000 GPUs hours on 400m samples for 32 epochs you should get good results

I want to compare the effect of CLIP training from scratch and fine-tuning on my own dataset. For me, the re-training of 400m samples is obviously impossible~~

0 replies

rwightman · 2022-07-26T21:20:29Z

rwightman
Jul 26, 2022
Maintainer

@newbietuan training CLIP models, especially ViT ones from scratch is a data and compute hungry endevour for decent results. The README has a SLURM example script that'd reproduce B/32 training on LAION-400m, and enough details to change the batch size # GPUs for the larger models.

There is also an example commandline 4-GPU torchrun command that'd reproduce the 36.5% results on CC12m in the README. For smaller datasets in the 10-40M range you're likely only going to be able to train smaller ResNets well, the ViTs don't perform until using much larger datasets.

I'm moving this to discussions as it's not a bug...

0 replies

rom1504 · 2022-07-27T07:09:37Z

rom1504
Jul 27, 2022
Maintainer

Models like CLIP only make sense if trained on a large amount of data with many GPUs, after having been trained that way they can perform okay at zero shot classification and retrieval.

If your data looks even a bit like the clip training set or laion400m distribution then you can simply use the pretrained models without any training. See clip-retrieval module for retrieval usage for example.

If your data is really different, you probably want to start with supervised training where fine tuning can work even on small amount of data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train CLIP from scratch？ #133

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to train CLIP from scratch？ #133

newbietuan Jul 26, 2022

Replies: 8 comments

rom1504 Jul 26, 2022 Maintainer

newbietuan Jul 26, 2022 Author

rom1504 Jul 26, 2022 Maintainer

newbietuan Jul 26, 2022 Author

rom1504 Jul 26, 2022 Maintainer

newbietuan Jul 26, 2022 Author

rwightman Jul 26, 2022 Maintainer

rom1504 Jul 27, 2022 Maintainer

newbietuan
Jul 26, 2022

rom1504
Jul 26, 2022
Maintainer

newbietuan
Jul 26, 2022
Author

rom1504
Jul 26, 2022
Maintainer

newbietuan
Jul 26, 2022
Author

rom1504
Jul 26, 2022
Maintainer

newbietuan
Jul 26, 2022
Author

rwightman
Jul 26, 2022
Maintainer

rom1504
Jul 27, 2022
Maintainer