-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] LangChain OpenSearchVectorSearch broken with 2.4.1 w/AOSS #600
Comments
I can reproduce this against OpenSearch Serverless w/2.4.1. Code in https://github.com/dblock/opensearch-langchain. Local OpenSearchThis works with all of OSS OpenSearch, and Amazon Managed OpenSearch and Serverless. #!/usr/bin/env python3
from os import environ
from typing import List
from langchain.vectorstores import OpenSearchVectorSearch
from langchain.schema.embeddings import Embeddings
fake_texts = ["foo", "bar", "baz"]
class FakeEmbeddings(Embeddings):
def embed_documents(self, texts: List[str]) -> List[List[float]]:
return [[float(1.0)] * 9 + [float(i)] for i in range(len(texts))]
def embed_query(self, text: str) -> List[float]:
return [float(1.0)] * 9 + [float(0.0)]
docsearch = OpenSearchVectorSearch.from_texts(
fake_texts,
FakeEmbeddings(),
opensearch_url='https://localhost:9200',
verify_certs=False,
http_auth=("admin", "admin")
)
OpenSearchVectorSearch.add_texts(
docsearch, fake_texts, vector_field="my_vector", text_field="custom_text"
) Amazon OpenSearchThis works with OSS OpenSearch, Amazon Managed OpenSearch, but not with Serverless on 2.4.1. #!/usr/bin/env python3
import logging
from os import environ
from typing import List
from urllib.parse import urlparse
from opensearchpy import AWSV4SignerAuth, OpenSearch, RequestsHttpConnection, __versionstr__
from langchain.vectorstores import OpenSearchVectorSearch
from langchain.schema.embeddings import Embeddings
from boto3 import Session
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.INFO)
opensearch_url = environ['ENDPOINT']
url = urlparse(opensearch_url)
region = environ.get('AWS_REGION', 'us-east-1')
service = environ.get('SERVICE', 'es')
credentials = Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
print(f"Using opensearch-py {__versionstr__}")
fake_texts = ["foo", "bar", "baz"]
class FakeEmbeddings(Embeddings):
def embed_documents(self, texts: List[str]) -> List[List[float]]:
return [[float(1.0)] * 9 + [float(i)] for i in range(len(texts))]
def embed_query(self, text: str) -> List[float]:
return [float(1.0)] * 9 + [float(0.0)]
docsearch = OpenSearchVectorSearch.from_texts(
fake_texts,
FakeEmbeddings(),
opensearch_url=opensearch_url,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
http_auth=auth
)
OpenSearchVectorSearch.add_texts(
docsearch, fake_texts, vector_field="my_vector", text_field="custom_text"
)
|
Uses bulk |
@dblock I remember we added a check in langchain to identity a diff between aoss and AOS Ref: https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/opensearch_vector_search.py#L83-L91 . it was done using does that get changed? |
I bisected this to #547. We are sending different data!
|
@dblock this is interesting. Are we raising a PR to fix this or reverting the change? |
The problem is here: https://github.com/langchain-ai/langchain/blob/b4312aac5c0567088353178fb70fdb356b372e12/libs/langchain/langchain/vectorstores/opensearch_vector_search.py#L83 The code relies on
|
Fix in #603. |
@dblock its not necessary related to AOSS. Even in AOS/OSS, if user is asking for OpenSearch service to generate the ids for the data, os client should not send _id in the bulk request. We should respect the customer input, and seems like urlib3 is not respecting it |
I don't believe that's correct, see #603 (comment) |
We've released 2.4.2 with a fix. |
Hi, I am running into the exact same issue in javascript sdk. Is there a way you guys fix it there as well? @dblock |
Can you please open an issue in opensearch-js? |
What is the bug?
A clear and concise description of the bug.
When using LangChain OpenSearchVectorSearch with OpenSearchServerless and doing:
OR
I get the error
This same code works fine on 2.3.2.
How can one reproduce the bug?
Steps to reproduce the behavior.
Create a "OpenSearchVectorSearch" in LangChain when using opensearch-py 2.4.0, and try to upload documents to it
What is the expected behavior?
A clear and concise description of what you expected to happen.
Documents upload successfully
What is your host/environment?
Operating system, version.
Python Lambda image. Specifically
public.ecr.aws/lambda/python:3.9
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
No
Do you have any additional context?
Add any other context about the problem.
No
The text was updated successfully, but these errors were encountered: