You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In the coming rework of Tutorials, the first tutorial uses an InMemoryDocumentStore and a TFIDFRetriever. It would be preferable to use the BM25Retriever instead since it is an improvement over TFIDF but it is currently incompatible with the InMemoryDocumentStore.
Describe the solution you'd like
Implement BM25 retrieval that doesn't rely on the Elasticsearch implementation. This might involve using another BM25 Python library such as rank-bm25
The text was updated successfully, but these errors were encountered:
I think this is a duplicate of #3447. Let's keep that one only?
In practice many other document stores can hardly support BM25, and I'm not sure it's worth stretching the support too much right now. Implementing support for BM25 with the current arch in a docstore like, say, Pinecone is going to look quite ugly and, above all, very slow. This is because, to add support for BM25:
the index would need to be rebuilt every time the docstore is instantiated (slower proportionally to the docstore size), or
every document would need to be updated every time a new document is added, making adding documents slower and slower as the document store size increases (i.e. adding the Nth document is going to be N times slower than adding the 1st).
Or at least that's my understanding of it. Open to be corrected!
Note that this is all true for InMemory as well, but being it a "toy" docstore, i'm not concerned because it will never be used in production with big datasets and therefore such overheads are manageable.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
The text was updated successfully, but these errors were encountered: