where is bm25 introduced? #16

tangzhy · 2021-07-23T03:47:12Z

Hi,

For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.

But I don't find any code introducing bm25 index and bm25 sampling.
I guess you are treating triples.small data's negatives as bm25 negs already?

What does bm25 warm up mean? How is that introduced?

Thanks

juyongjiang · 2022-01-07T14:54:12Z

Hi,

For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.

But I don't find any code introducing bm25 index and bm25 sampling. I guess you are treating triples.small data's negatives as bm25 negs already?

What does bm25 warm up mean? How is that introduced?

Thanks

Yeah, I also can't find the BM25 index. Have you found the answer to it?

MewemeW · 2022-07-28T08:03:36Z

+1

robro612 · 2022-09-15T03:52:48Z

I believe @tangzhy is correct (at least on MSMARCO), the triples.train.small.tsv were generated by the MSMARCO dataset itself, and they refer to generating the triplets using BM25 in the raw text of the README, hence why there's no reference to BM25 in this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

where is bm25 introduced? #16

where is bm25 introduced? #16

tangzhy commented Jul 23, 2021 •

edited

Loading

juyongjiang commented Jan 7, 2022

MewemeW commented Jul 28, 2022

robro612 commented Sep 15, 2022

where is bm25 introduced? #16

where is bm25 introduced? #16

Comments

tangzhy commented Jul 23, 2021 • edited Loading

juyongjiang commented Jan 7, 2022

MewemeW commented Jul 28, 2022

robro612 commented Sep 15, 2022

tangzhy commented Jul 23, 2021 •

edited

Loading