-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307
Comments
Hi @polgrisha, we haven't tested that pytorch script extensively, especially in zero-shot, but it seems that some hyperparameters were wrong. I opened a PR with the ones we used to train the model on TPUs + TF: Could you please give it a try? |
I was looking at my logs and I was never able to reproduce the results on pytorch+GPU using the same hyperparameters used to finetune on TF+TPUs. The best ones I found were the ones already in the repo. However, in another project, I found that this configuration gives good results to finetune T5 on PT+GPUs: --train_batch_size=4 Could you please give it a try? |
@rodrigonogueira4 Thanks for your response I tried the hyperparams you suggested: --train_batch_size=4 And so far, the closest result was obtained by training mono-t5 for 9k steps (10k is one epoch with batch_size=4, accum_steps=32 and 2 gpus) (TREC-COVID: original-0.7845, my-0.7899; NFCorpus: original-0.3778, my-0.3731, NQ: original-0.5676, my-0.5688, FIQA-2018: original: 0.4129, my: 0.4130) |
Hi @polgrisha, thanks for running this experiment. It seems that you go pretty close to the original training in mesh-tensorflow+TPUs. I expected those small differences in the individual datasets from BEIR, especially since you are using a different optimizer. |
Hello, thank you very much for your work! But I still have some questions. |
Sorry about the late reply. The correct configuration should be batches of 128 examples, so 10k steps means 6.4M lines of the triples.train.small.tsv file. |
Hello!
I am trying to reproduce quality of monoT5 on BEIR benchmark from the recent article. But after running script
finetune_monot5.py
on one epoch, as stated in the description of the checkpoint "monot5-base-msmarco-10k", my results are quite lower.For example, on NQ, when I use my checkpoint the result is 0.5596 ndcg@10. But when I use the original checkpoint - 0.5676 ndcg@10. On NFCorpus: 0.3604 ndcg@10 with my checkpoint, 0.3778 ndcg@10 with the original.
So, is one epoch of training monoT5 with pytorch script similar to one epoch of training with TF? And with what hyperparameters can I reproduce performance of "monot5-base-msmarco-10k"?
The text was updated successfully, but these errors were encountered: