OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

gsdssn · 2023-05-24T20:49:47Z

Using the following script, I can only return maximum 4 documents. With k = 1, k= 2, k=3, k = 4, k =5, k=6, ... similarity_search_with_score returns 1, 2, 3, 4, 4, 4... docs.

opensearch_url = "xxxxxxxxx.com"
docsearch = OpenSearchVectorSearch.from_documents(docs, 
                                                  embedding = HuggingFaceEmbeddings(), 
                                                  opensearch_url=opensearch_url, 
                                                  index_name="my_index_name")
retrieved_docs = docsearch.similarity_search_with_score(query, k=10)

This only return 4 documents even though I have len(docs) = 90+. Tried various indexes and various queries. Confirmed the issue is persistent.
Find a related issue (also max out at 4 regardless of k) for Chroma.

The text was updated successfully, but these errors were encountered:

gsdssn · 2023-05-24T21:38:41Z

Using search_type="painless_scripting" would fix this issue. Sounds like this is only an issue for approximate search. Tried "faiss" and "nmslib", both have max retrieved documents = 4

naveentatikonda · 2023-05-24T23:43:32Z

@gsdssn you also need to set another parameter size for approximate search which is by default set to 4 like k.

k is the number of neighbors the search of each graph will return. You must also include the size option, which indicates how many results the query actually returns.

With this PR which Davis created you don't need to set the size parameter which will be same as k.

@naveentatikonda

For most queries it's the `size` parameter that determines final number of documents to return. Since our abstractions refer to this as `k`, set this to be `k` everywhere instead of expecting a separate param. Would be great to have someone more familiar with OpenSearch validate that this is reasonable (e.g. that having `size` and what OpenSearch calls `k` be the same won't lead to any strange behavior). cc @naveentatikonda Closes #5212

@naveentatikonda

For most queries it's the `size` parameter that determines final number of documents to return. Since our abstractions refer to this as `k`, set this to be `k` everywhere instead of expecting a separate param. Would be great to have someone more familiar with OpenSearch validate that this is reasonable (e.g. that having `size` and what OpenSearch calls `k` be the same won't lead to any strange behavior). cc @naveentatikonda Closes langchain-ai#5212

dev2049 mentioned this issue May 24, 2023

OpenSearch top k parameter fix #5216

Merged

dev2049 closed this as completed in #5216 May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

gsdssn commented May 24, 2023

gsdssn commented May 24, 2023

naveentatikonda commented May 24, 2023

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

Comments

gsdssn commented May 24, 2023

gsdssn commented May 24, 2023

naveentatikonda commented May 24, 2023