CoCa v2: fixes and improvements #554

iejMac · 2023-06-27T13:58:01Z

There has been some issues raised about some mistakes made in the current CoCa implementation and there are also some improvements that can be made. This issue will enumerate them and we can track progress here. Once the TODO's are completed we can attempt another set of re-training runs and maybe even scale if the results make sense.

Problems:

MultimodalTransformer initialization (CoCa: fix MultimodalTransformer init + Mask CLS token at end of seq #551)
Mask CLS token at end of sequence (CoCa: fix MultimodalTransformer init + Mask CLS token at end of seq #551)
Number of tokens problem (Issue with number of tokens for CoCa #458)
Fix pooling

Improvements:

Integrate generation with HF (so we can take out generate func)

Please add anything that I might've missed
cc: @gpucce @rom1504 @rwightman

iejMac · 2023-06-27T13:59:07Z

I will try to start a small B/32 run with the first 2 problems "solved" in that PR. We can compare to the first few B/32 runs.

gpucce · 2023-06-27T14:35:10Z

	dataset	model	acc1	acc5
0	imagenet1k	coca_ViT-B-32	0.636	0.881
1	imagenet1k	coca_ViT-B-32_fixed_cls_mask	0.638	0.882
2	imagenet1k	coca_ViT-L-14	0.756	0.943
3	imagenet1k	coca_ViT-L-14_fixed_cls_mask	0.755	0.941

Changing cls mask leaves performance almost unchanged without retraining.

rwightman · 2023-06-27T17:09:36Z

Re the remove MHA from attn pooler, I don't recall what the motivation for that was? It doesn't lool like it's doing anything that isn't supported by MHA at this point (like q/k norms, etc)...

gpucce · 2023-06-27T17:13:42Z

Re the remove MHA from attn pooler, I don't recall what the motivation for that was? It doesn't lool like it's doing anything that isn't supported by MHA at this point (like q/k norms, etc)...

the point was to split the linear layer passing the "cls" token to clip loss and the one passing the remaining ones to the decoder because this is a difference with respect to the original paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoCa v2: fixes and improvements #554

CoCa v2: fixes and improvements #554

iejMac commented Jun 27, 2023 •

edited

Loading

iejMac commented Jun 27, 2023

gpucce commented Jun 27, 2023

rwightman commented Jun 27, 2023

gpucce commented Jun 27, 2023

CoCa v2: fixes and improvements #554

CoCa v2: fixes and improvements #554

Comments

iejMac commented Jun 27, 2023 • edited Loading

iejMac commented Jun 27, 2023

gpucce commented Jun 27, 2023

rwightman commented Jun 27, 2023

gpucce commented Jun 27, 2023

iejMac commented Jun 27, 2023 •

edited

Loading