Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models #3401

Merged
merged 22 commits into from
May 23, 2023

Conversation

jeffvestal
Copy link
Contributor

This PR introduces a new module, elasticsearch_embeddings.py, which provides a wrapper around Elasticsearch embedding models. The new ElasticsearchEmbeddings class allows users to generate embeddings for documents and query texts using a model deployed in an Elasticsearch cluster.

Main features:

  1. The ElasticsearchEmbeddings class initializes with an Elasticsearch connection object and a model_id, providing an interface to interact with the Elasticsearch ML client through infer_trained_model .
  2. The embed_documents() method generates embeddings for a list of documents, and the embed_query() method generates an embedding for a single query text.
  3. The class supports custom input text field names in case the deployed model expects a different field name than the default text_field.
  4. The implementation is compatible with any model deployed in Elasticsearch that generates embeddings as output.

Benefits:

  1. Simplifies the process of generating embeddings using Elasticsearch models.
  2. Provides a clean and intuitive interface to interact with the Elasticsearch ML client.
  3. Allows users to easily integrate Elasticsearch-generated embeddings.

This is my first PR for this project.
I created an integration test file, however, I could use some guidance on how to set it up since it needs an Elasticsearch cluster running an embedding model.

Let me know if there are any structural changes needed or anything missing.

Related issue #3400

@dev2049
Copy link
Contributor

dev2049 commented May 15, 2023

@jeffvestal would it be possible to add an example notebook to docs/modules/models/text_embedding/examples? can be as simple as the example in the constructor docstring. otherwise LGTM!

@jeffvestal
Copy link
Contributor Author

@dev2049 Definitely. It might be a couple days as I'm traveling this week but I'll get something in there.

@dev2049 dev2049 added the needs documentation PR needs to be updated with documentation label May 22, 2023
@dev2049 dev2049 added lgtm PR looks good. Use to confirm that a PR is ready for merging. and removed needs documentation PR needs to be updated with documentation labels May 22, 2023
@dev2049 dev2049 merged commit 0b542a9 into langchain-ai:master May 23, 2023
vowelparrot pushed a commit that referenced this pull request May 24, 2023
…sticsearch models (#3401)

This PR introduces a new module, `elasticsearch_embeddings.py`, which
provides a wrapper around Elasticsearch embedding models. The new
ElasticsearchEmbeddings class allows users to generate embeddings for
documents and query texts using a [model deployed in an Elasticsearch
cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding).

### Main features:

1. The ElasticsearchEmbeddings class initializes with an Elasticsearch
connection object and a model_id, providing an interface to interact
with the Elasticsearch ML client through
[infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model)
.
2. The `embed_documents()` method generates embeddings for a list of
documents, and the `embed_query()` method generates an embedding for a
single query text.
3. The class supports custom input text field names in case the deployed
model expects a different field name than the default `text_field`.
4. The implementation is compatible with any model deployed in
Elasticsearch that generates embeddings as output.

### Benefits:

1. Simplifies the process of generating embeddings using Elasticsearch
models.
2. Provides a clean and intuitive interface to interact with the
Elasticsearch ML client.
3. Allows users to easily integrate Elasticsearch-generated embeddings.

Related issue #3400

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
@jeffvestal jeffvestal deleted the elasticsearch_embeddings branch May 24, 2023 06:48
@jeffvestal jeffvestal mentioned this pull request May 24, 2023
dev2049 pushed a commit that referenced this pull request May 24, 2023
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants