Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

Merged
merged 2 commits into from
Sep 4, 2023

Conversation

ElReyZero
Copy link
Contributor

Description

Issue

This pull request addresses a lingering issue identified in PR #7070. In that previous pull request, an attempt was made to address the problem of empty embeddings when using the OpenAIEmbeddings class. While PR #7070 introduced a mechanism to retry requests for embeddings, it didn't fully resolve the issue as empty embeddings still occasionally persisted.

Problem

In certain specific use cases, empty embeddings can be encountered when requesting data from the OpenAI API. In some cases, these empty embeddings can be skipped or removed without affecting the functionality of the application. However, they might not always be resolved through retries, and their presence can adversely affect the functionality of applications relying on the OpenAIEmbeddings class.

Solution

To provide a more robust solution for handling empty embeddings, we propose the introduction of an optional parameter, skip_empty, in the OpenAIEmbeddings class. When set to True, this parameter will enable the behavior of automatically skipping empty embeddings, ensuring that problematic empty embeddings do not disrupt the processing flow. The developer will be able to optionally toggle this behavior if needed without disrupting the application flow.

Changes Made

  • Added an optional parameter, skip_empty, to the OpenAIEmbeddings class.
  • When skip_empty is set to True, empty embeddings are automatically skipped without causing errors or disruptions.

Example Usage

from openai.embeddings import OpenAIEmbeddings

# Initialize the OpenAIEmbeddings class with skip_empty=True
embeddings = OpenAIEmbeddings(api_key="your_api_key", skip_empty=True)

# Request embeddings, empty embeddings are automatically skipped. docs is a variable containing the already splitted text.
results = embeddings.embed_documents(docs)

# Process results without interruption from empty embeddings

@vercel
Copy link

vercel bot commented Sep 4, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Sep 4, 2023 8:11pm

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 4, 2023
@hwchase17 hwchase17 merged commit 5dbae94 into langchain-ai:master Sep 4, 2023
@benprofitt
Copy link

benprofitt commented Sep 12, 2023

This change in 0.0.283 breaks backwards compatibility for me. When unpickling objects that used the OpenAIEmbeddings and trying to use them I get errors related to the presence of skip_empty :(

I don't know if this is the right place for this, but thought I should let someone know!

Error:

[ERROR] AttributeError: 'OpenAIEmbeddings' object has no attribute 'skip_empty'
Traceback (most recent call last):
  File "/var/task/docs.py", line 38, in get_related_chunks_and_metadata_faiss
    results = faiss_index.similarity_search_with_relevance_scores(query, k=k)
  File "/var/task/langchain/vectorstores/base.py", line 247, in similarity_search_with_relevance_scores
    docs_and_similarities = self._similarity_search_with_relevance_scores(
  File "/var/task/langchain/vectorstores/faiss.py", line 764, in _similarity_search_with_relevance_scores
    docs_and_scores = self.similarity_search_with_score(
  File "/var/task/langchain/vectorstores/faiss.py", line 275, in similarity_search_with_score
    embedding = self.embedding_function(query)
  File "/var/task/langchain/embeddings/openai.py", line 511, in embed_query
    return self.embed_documents([text])[0]
  File "/var/task/langchain/embeddings/openai.py", line 483, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
  File "/var/task/langchain/embeddings/openai.py", line 367, in _get_len_safe_embeddings
    response = embed_with_retry(
  File "/var/task/langchain/embeddings/openai.py", line 107, in embed_with_retry
    return _embed_with_retry(**kwargs)
  File "/var/task/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/task/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/task/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/var/task/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/task/langchain/embeddings/openai.py", line 105, in _embed_with_retry
    return _check_response(response, skip_empty=embeddings.skip_empty)

@ElReyZero
Copy link
Contributor Author

ElReyZero commented Sep 21, 2023

@benprofitt That is sadly a consequence of using pickled objects, they are completely dependant on the exact versions of the libraries and code that you used at the time you dumped the object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: embeddings Related to text embedding models module 🤖:improvement Medium size change to existing code to handle new use-cases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants