How to train CLIP from scratch? #133
Replies: 8 comments
-
Are you asking how to train a new tokenizer ?
If so, you might like
https://huggingface.co/docs/tokenizers/training_from_memory
…On Tue, Jul 26, 2022, 09:57 newbietuan ***@***.***> wrote:
Hi, Thanks for the great work.
Due to the needs of specific tasks, i want to train CLIP from scratch
without using BPE coding and the length limit of 77, how should i do?
—
Reply to this email directly, view it on GitHub
<#131>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437Q3T5AHTINLTJNUBCDVV6K65ANCNFSM54U5YNZA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Sorry I wasn't clear enough. |
Beta Was this translation helpful? Give feedback.
-
All the instructions in the readme are for training from scratch. In fact
fine tuning is not really supported.
…On Tue, Jul 26, 2022, 10:32 newbietuan ***@***.***> wrote:
Sorry I wasn't clear enough.
I want to train CLIP from scratch using my own data, rather than
fine-tuning, to thus embed into my task.
Hope to use my original dataset, dataloader, tokenizer, etc. to train CLIP
from scratch and use it for image-to-text retrieval.
It seems i should use the code
https://github.com/openai/CLIP/blob/main/clip/model.py, while i am not
very clear about the initialization and training process of the CLIP model,
and it involves the loading of the pre-trained model ResNet/Vit.
—
Reply to this email directly, view it on GitHub
<#131 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437WCWOXRAG4R33D2SUTVV6PANANCNFSM54U5YNZA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much. |
Beta Was this translation helpful? Give feedback.
-
Yes if you run https://github.com/mlfoundations/open_clip#sample-single-process-running-code for 6000 GPUs hours on 400m samples for 32 epochs you should get good results |
Beta Was this translation helpful? Give feedback.
-
I want to compare the effect of CLIP training from scratch and fine-tuning on my own dataset. For me, the re-training of 400m samples is obviously impossible~~ |
Beta Was this translation helpful? Give feedback.
-
@newbietuan training CLIP models, especially ViT ones from scratch is a data and compute hungry endevour for decent results. The README has a SLURM example script that'd reproduce B/32 training on LAION-400m, and enough details to change the batch size # GPUs for the larger models. There is also an example commandline 4-GPU torchrun command that'd reproduce the 36.5% results on CC12m in the README. For smaller datasets in the 10-40M range you're likely only going to be able to train smaller ResNets well, the ViTs don't perform until using much larger datasets. I'm moving this to discussions as it's not a bug... |
Beta Was this translation helpful? Give feedback.
-
Models like CLIP only make sense if trained on a large amount of data with many GPUs, after having been trained that way they can perform okay at zero shot classification and retrieval. If your data looks even a bit like the clip training set or laion400m distribution then you can simply use the pretrained models without any training. See clip-retrieval module for retrieval usage for example. If your data is really different, you probably want to start with supervised training where fine tuning can work even on small amount of data. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the great work.
Due to the needs of specific tasks, i want to train CLIP from scratch without using BPE coding and the length limit of 77, how should i do?
Beta Was this translation helpful? Give feedback.
All reactions