Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: interfaces for async embeddings, implement async openai #6111

Closed

Conversation

tyree731
Copy link
Contributor

This change adds support to the base Embeddings class for two methods, aembed_query and aembed_documents, those two methods supporting async equivalents of embed_query and
embed_documents respectively. This ever so slightly rounds out async support within langchain, with an initial implementation of this functionality being implemented for openai.

Implements #6109

This change adds support to the base `Embeddings` class for two
methods, `aembed_query` and `aembed_documents`, those two methods
supporting async equivalents of `embed_query` and
`embed_documents` respectively. This ever so slightly rounds out
async support within langchain, with an initial implementation
of this functionality being implemented for openai.

Implements langchain-ai#6109
@tyree731
Copy link
Contributor Author

CC @hwchase17 @dev2049 for review. the _aget_len_safe_embeddings bit is a bit rough, as I couldn't think of a great way to reuse the non-embedding work from _get_len_safe_embeddings, open to suggestions there.

@tyree731
Copy link
Contributor Author

Let me know if there is anything I can do to help push this along. I know you're all busy, but just want to make sure there isn't anything on my end holding this up. Also:

  • I wasn’t 100% on where I should be adding more documentation, so if I should be adding more documentation, kindly point me in the right direction there.
  • I added integration tests for openai async embeddings, but if I should be adding tests for fake embeddings async or some such, I can do so.

Thanks!

langchain/embeddings/base.py Outdated Show resolved Hide resolved
@@ -53,6 +54,38 @@ def _create_retry_decorator(embeddings: OpenAIEmbeddings) -> Callable[[Any], Any
)


def _async_retry_decorator(embeddings: OpenAIEmbeddings) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is mirroring the sync decorator which is also taking embeddings as an argument (should be taking retry parameters not embeddings), but OK as it's mirroring existing functionality

langchain/embeddings/openai.py Outdated Show resolved Hide resolved
@eyurtsev
Copy link
Collaborator

Hi @tyree731 , code looks good -- let's remove the abstractmethod to reduce scope and avoid breaking changes and then code is good to merge.

@vercel
Copy link

vercel bot commented Jun 16, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ❌ Failed (Inspect) Jun 16, 2023 1:26am

@vercel vercel bot temporarily deployed to Preview June 16, 2023 01:26 Inactive
@tyree731
Copy link
Contributor Author

Updated based on your comments. Let me know if there's anything I need to do for the vercel failure.

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add these methods to the base class, but lets just do something like

    async def aembed_documents(
        self, texts: List[str], chunk_size: Optional[int] = 0
    ) -> List[List[float]]:
    raise NotImplementedError

In the base class (rather than mark as abstract and have each subclass have to implement that)

@dev2049 dev2049 added the 03 enhancement Enhancement of existing functionality label Jun 21, 2023
@tyree731
Copy link
Contributor Author

Sorry for the delay in responding @hwchase17 , currently on vacation in Germany for the next couple of weeks, so I'll get to your comments when I get back.

hwchase17 pushed a commit that referenced this pull request Jun 22, 2023
Since it seems like #6111 will be blocked for a bit, I've forked
@tyree731's fork and implemented the requested changes.

This change adds support to the base Embeddings class for two methods,
aembed_query and aembed_documents, those two methods supporting async
equivalents of embed_query and
embed_documents respectively. This ever so slightly rounds out async
support within langchain, with an initial implementation of this
functionality being implemented for openai.

Implements #6109

---------

Co-authored-by: Stephen Tyree <tyree731@gmail.com>
@hwchase17 hwchase17 closed this Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 enhancement Enhancement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants