-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoCa v2: fixes and improvements #554
Comments
I will try to start a small B/32 run with the first 2 problems "solved" in that PR. We can compare to the first few B/32 runs. |
Changing cls mask leaves performance almost unchanged without retraining. |
Re the remove MHA from attn pooler, I don't recall what the motivation for that was? It doesn't lool like it's doing anything that isn't supported by MHA at this point (like q/k norms, etc)... |
the point was to split the linear layer passing the "cls" token to clip loss and the one passing the remaining ones to the decoder because this is a difference with respect to the original paper. |
There has been some issues raised about some mistakes made in the current CoCa implementation and there are also some improvements that can be made. This issue will enumerate them and we can track progress here. Once the TODO's are completed we can attempt another set of re-training runs and maybe even scale if the results make sense.
Problems:
Improvements:
Please add anything that I might've missed
cc: @gpucce @rom1504 @rwightman
The text was updated successfully, but these errors were encountered: