Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

Closed
gsdssn opened this issue May 24, 2023 · 2 comments · Fixed by #5216
Closed

OpenSearch VectorStore cannot return more than 4 retrieved result. #5212

gsdssn opened this issue May 24, 2023 · 2 comments · Fixed by #5216

Comments

@gsdssn
Copy link

gsdssn commented May 24, 2023

Using the following script, I can only return maximum 4 documents. With k = 1, k= 2, k=3, k = 4, k =5, k=6, ... similarity_search_with_score returns 1, 2, 3, 4, 4, 4... docs.

opensearch_url = "xxxxxxxxx.com"
docsearch = OpenSearchVectorSearch.from_documents(docs, 
                                                  embedding = HuggingFaceEmbeddings(), 
                                                  opensearch_url=opensearch_url, 
                                                  index_name="my_index_name")
retrieved_docs = docsearch.similarity_search_with_score(query, k=10)

This only return 4 documents even though I have len(docs) = 90+. Tried various indexes and various queries. Confirmed the issue is persistent.
Find a related issue (also max out at 4 regardless of k) for Chroma.

@gsdssn
Copy link
Author

gsdssn commented May 24, 2023

Using search_type="painless_scripting" would fix this issue. Sounds like this is only an issue for approximate search. Tried "faiss" and "nmslib", both have max retrieved documents = 4

@naveentatikonda
Copy link
Contributor

@gsdssn you also need to set another parameter size for approximate search which is by default set to 4 like k.

k is the number of neighbors the search of each graph will return. You must also include the size option, which indicates how many results the query actually returns.

With this PR which Davis created you don't need to set the size parameter which will be same as k.

dev2049 added a commit that referenced this issue May 25, 2023
For most queries it's the `size` parameter that determines final number
of documents to return. Since our abstractions refer to this as `k`, set
this to be `k` everywhere instead of expecting a separate param. Would
be great to have someone more familiar with OpenSearch validate that
this is reasonable (e.g. that having `size` and what OpenSearch calls
`k` be the same won't lead to any strange behavior). cc @naveentatikonda

Closes #5212
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this issue Jun 19, 2023
For most queries it's the `size` parameter that determines final number
of documents to return. Since our abstractions refer to this as `k`, set
this to be `k` everywhere instead of expecting a separate param. Would
be great to have someone more familiar with OpenSearch validate that
this is reasonable (e.g. that having `size` and what OpenSearch calls
`k` be the same won't lead to any strange behavior). cc @naveentatikonda

Closes langchain-ai#5212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants