Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is bm25 introduced? #16

Open
tangzhy opened this issue Jul 23, 2021 · 3 comments
Open

where is bm25 introduced? #16

tangzhy opened this issue Jul 23, 2021 · 3 comments

Comments

@tangzhy
Copy link

tangzhy commented Jul 23, 2021

Hi,

For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.

But I don't find any code introducing bm25 index and bm25 sampling.
I guess you are treating triples.small data's negatives as bm25 negs already?

What does bm25 warm up mean? How is that introduced?

Thanks

@juyongjiang
Copy link

Hi,

For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.

But I don't find any code introducing bm25 index and bm25 sampling. I guess you are treating triples.small data's negatives as bm25 negs already?

What does bm25 warm up mean? How is that introduced?

Thanks

Yeah, I also can't find the BM25 index. Have you found the answer to it?

@MewemeW
Copy link

MewemeW commented Jul 28, 2022

+1

@robro612
Copy link

I believe @tangzhy is correct (at least on MSMARCO), the triples.train.small.tsv were generated by the MSMARCO dataset itself, and they refer to generating the triplets using BM25 in the raw text of the README, hence why there's no reference to BM25 in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants