-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for BM25Retriever in InMemoryDocumentStore #3447
Comments
Some additional context (same idea from @bglearning): deepset-ai/haystack-tutorials#44 (comment) |
BM25 was recently added to gensim: piskvorky/gensim#3304, we might use this. |
@ZanSara: please advise me on how I can contribute to the haystack, in general, or to this issue, in particular. Thanks :) |
Hello @vtharmalingam! Start from here: https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md |
Hi @vtharmalingam great to hear that you would like to contribute on this issue! Feel free to open a draft pull request early on so that we can help with feedback. Please let me know if you need any other help. |
I am also interested in contributing to this feature! After the discussions linked above, I want to bring some ideas.
WDYT? |
Hey @anakin87! Sounds good 🙁
Fully agree!
I also agree on this. It bring in no dependencies, which is a relief 😄 For now at least I imagine it would be a nice compromise. We should make sure that for workloads like the one in the tutorials it doesn't become too slow, however. Let's see after how many documents the 1s threshold for retrieval is reached.
I trust you on this for now, I haven't read the
This issue really has no good answer in the current architecture, but so is the amazing Something like Then we can do the same to the other docstores for |
I'm starting to work on this... 🛠️ |
Is your feature request related to a problem? Please describe.
Many of our tutorials are using the ElasticsearchDocumentStore. While this is a good choice for a production system, it can be quite cumbersome to set up and run during your "first minutes with haystack". It would be awesome to use the
InMemoryDocumentStore
instead for the first steps of users (Tutorial 1, Quick Start ...). The only thing that I see holding us back here:InMemoryDocumentStore
doesn't support ourBM25Retriever
which is a fast and easy retriever to get started with (and therefore a good choice in tutorials). Switching to another retriever might complicate tutorials, slow them down or reduce the quality of answers users get.Describe the solution you'd like
Supporting the usage of the
BM25Retriever
in combination with theInMemoryDocumentStore
Describe alternatives you've considered
Using TFIDFRetriever but I am concerned about the quality of results and leading our users into a wrong direction here.
Priority
I don't think this feature is urgent but it might be a helpful step when we want to improve the early user experience
The text was updated successfully, but these errors were encountered: