Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

polgrisha · 2022-11-25T13:10:36Z

Hello!

I am trying to reproduce quality of monoT5 on BEIR benchmark from the recent article. But after running script finetune_monot5.py on one epoch, as stated in the description of the checkpoint "monot5-base-msmarco-10k", my results are quite lower.

For example, on NQ, when I use my checkpoint the result is 0.5596 ndcg@10. But when I use the original checkpoint - 0.5676 ndcg@10. On NFCorpus: 0.3604 ndcg@10 with my checkpoint, 0.3778 ndcg@10 with the original.

So, is one epoch of training monoT5 with pytorch script similar to one epoch of training with TF? And with what hyperparameters can I reproduce performance of "monot5-base-msmarco-10k"?

The text was updated successfully, but these errors were encountered:

rodrigonogueira4 · 2022-12-01T09:13:05Z

Hi @polgrisha, we haven't tested that pytorch script extensively, especially in zero-shot, but it seems that some hyperparameters were wrong.

I opened a PR with the ones we used to train the model on TPUs + TF:
#308

Could you please give it a try?

rodrigonogueira4 · 2022-12-01T09:57:03Z

I was looking at my logs and I was never able to reproduce the results on pytorch+GPU using the same hyperparameters used to finetune on TF+TPUs. The best ones I found were the ones already in the repo.

However, in another project, I found that this configuration gives good results to finetune T5 on PT+GPUs:

--train_batch_size=4
--accumulate_grad_batches=32
--optimizer=AdamW
--lr=3e-4 (or 3e-5)
--weight_decay=5e-5

Could you please give it a try?

polgrisha · 2022-12-06T14:28:27Z

@rodrigonogueira4 Thanks for your response

I tried the hyperparams you suggested:

--train_batch_size=4
--accumulate_grad_batches=32
--optimizer=AdamW
--lr=3e-5
--weight_decay=5e-5

And so far, the closest result was obtained by training mono-t5 for 9k steps (10k is one epoch with batch_size=4, accum_steps=32 and 2 gpus)

(TREC-COVID: original-0.7845, my-0.7899; NFCorpus: original-0.3778, my-0.3731, NQ: original-0.5676, my-0.5688, FIQA-2018: original: 0.4129, my: 0.4130)

rodrigonogueira4 · 2022-12-07T10:06:18Z

Hi @polgrisha, thanks for running this experiment. It seems that you go pretty close to the original training in mesh-tensorflow+TPUs.

I expected those small differences in the individual datasets from BEIR, especially since you are using a different optimizer.
However, to be really sure, I would run on a few more datasets and compare the average against the results reported in the "No parameter left behind" paper.

zlh-source · 2023-07-02T18:18:18Z

@rodrigonogueira4 Thanks for your response

I tried the hyperparams you suggested:

--train_batch_size=4 --accumulate_grad_batches=32 --optimizer=AdamW --lr=3e-5 --weight_decay=5e-5

And so far, the closest result was obtained by training mono-t5 for 9k steps (10k is one epoch with batch_size=4, accum_steps=32 and 2 gpus)

(TREC-COVID: original-0.7845, my-0.7899; NFCorpus: original-0.3778, my-0.3731, NQ: original-0.5676, my-0.5688, FIQA-2018: original: 0.4129, my: 0.4130)

Hello, thank you very much for your work! But I still have some questions.
batch_size=4, accum_steps=32 and 2 gpus. Then, 1 step is 4*32*2=256 batch size. The huggingface checkpoint "monot5-base-msmarco-10k" is 10k step of 128 batch size , using the first 6.4e5 lines of data from the training set. So (1) you used twice as much data as the "monot5-base-msmarco-10k"? (2) Or did you also use the first 6.4e5 lines, but train twice? (3) Or did you also use the first 6.4e5 lines, but because the batch size is twice as large, you trained 5K steps?

rodrigo-f-nogueira · 2023-07-14T17:16:19Z

Sorry about the late reply. The correct configuration should be batches of 128 examples, so 10k steps means 6.4M lines of the triples.train.small.tsv file.

polgrisha closed this as completed Dec 6, 2022

polgrisha reopened this Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

polgrisha commented Nov 25, 2022 •

edited

Loading

rodrigonogueira4 commented Dec 1, 2022

rodrigonogueira4 commented Dec 1, 2022

polgrisha commented Dec 6, 2022

rodrigonogueira4 commented Dec 7, 2022

zlh-source commented Jul 2, 2023 •

edited

Loading

rodrigo-f-nogueira commented Jul 14, 2023

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

Cannot reproduce "monot5-base-msmarco-10k" via pytorch script #307

Comments

polgrisha commented Nov 25, 2022 • edited Loading

rodrigonogueira4 commented Dec 1, 2022

rodrigonogueira4 commented Dec 1, 2022

polgrisha commented Dec 6, 2022

rodrigonogueira4 commented Dec 7, 2022

zlh-source commented Jul 2, 2023 • edited Loading

rodrigo-f-nogueira commented Jul 14, 2023

polgrisha commented Nov 25, 2022 •

edited

Loading

zlh-source commented Jul 2, 2023 •

edited

Loading