Skip to content

Commit

Permalink
community[patch]: Update ElasticSearch mappings to successfully add d…
Browse files Browse the repository at this point in the history
…ocuments from TextSplitter (#3629)

* failing test that shows how the loc format from text splitters conflicts with elasticsearch mappings

* explicitly declare .metadata.loc as an object in elasticsearch

* Throw an error if inserting vectors into elasticsearch fails.

* Lint + docs format

* Update docs

* Format

---------

Co-authored-by: Brace Sproul <braceasproul@gmail.com>
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
  • Loading branch information
3 people authored Dec 15, 2023
1 parent d8a3afc commit 7b06f73
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ LangChain.js accepts [@elastic/elasticsearch](https://github.com/elastic/elastic
npm install -S @elastic/elasticsearch
```

You'll also need to have an Elasticsearch instance running. You can use the [official Docker image](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) to get started, or you can use [Elastic Cloud](https://www.elastic.co/cloud/) the official cloud service provided by Elastic.
You'll also need to have an Elasticsearch instance running.
You can use the [official Docker image](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) to get started, or you can use [Elastic Cloud](https://www.elastic.co/cloud/), Elastic's official cloud service.

For connecting to Elastic Cloud you can read the documentation reported [here](https://www.elastic.co/guide/en/kibana/current/api-keys.html) for obtaining an API key.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ import { Client, ClientOptions } from "@elastic/elasticsearch";
import { Document } from "langchain/document";
import { OpenAI } from "langchain/llms/openai";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { VectorDBQAChain } from "langchain/chains";

import {
ElasticClientArgs,
ElasticVectorSearch,
} from "langchain/vectorstores/elasticsearch";
import { VectorDBQAChain } from "langchain/chains";
} from "@langchain/community/vectorstores/elasticsearch";

// to run this first run Elastic's docker-container with `docker-compose up -d --build`
export async function run() {
Expand Down
21 changes: 17 additions & 4 deletions libs/langchain-community/src/vectorstores/elasticsearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,13 @@ export class ElasticVectorSearch extends VectorStore {
text: documents[idx].pageContent,
},
]);
await this.client.bulk({ refresh: true, operations });
const results = await this.client.bulk({ refresh: true, operations });
if (results.errors) {
const reasons = results.items.map(
(result) => result.index?.error?.reason
);
throw new Error(`Failed to insert documents:\n${reasons.join("\n")}`);
}
return documentIds;
}

Expand Down Expand Up @@ -266,16 +272,23 @@ export class ElasticVectorSearch extends VectorStore {
mappings: {
dynamic_templates: [
{
// map all metadata properties to be keyword
"metadata.*": {
// map all metadata properties to be keyword except loc
metadata_except_loc: {
match_mapping_type: "*",
match: "metadata.*",
unmatch: "metadata.loc",
mapping: { type: "keyword" },
},
},
],
properties: {
text: { type: "text" },
metadata: { type: "object" },
metadata: {
type: "object",
properties: {
loc: { type: "object" }, // explicitly define loc as an object
},
},
embedding: {
type: "dense_vector",
dims: dimension,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,30 @@ describe("ElasticVectorSearch", () => {
const results = await store.similaritySearch("*", 11);
expect(results).toHaveLength(11);
});

test.skip("ElasticVectorSearch integration with text splitting metadata", async () => {
const createdAt = new Date().getTime();
const documents = [
new Document({
pageContent: "hello",
metadata: { a: createdAt, loc: { lines: { from: 1, to: 1 } } },
}),
new Document({
pageContent: "car",
metadata: { a: createdAt, loc: { lines: { from: 2, to: 2 } } },
}),
];

await store.addDocuments(documents);

const results1 = await store.similaritySearch("hello!", 1);

expect(results1).toHaveLength(1);
expect(results1).toEqual([
new Document({
metadata: { a: createdAt, loc: { lines: { from: 1, to: 1 } } },
pageContent: "hello",
}),
]);
});
});

2 comments on commit 7b06f73

@vercel
Copy link

@vercel vercel bot commented on 7b06f73 Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vercel
Copy link

@vercel vercel bot commented on 7b06f73 Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

langchainjs-docs – ./docs/core_docs/

langchainjs-docs-langchain.vercel.app
langchainjs-docs-git-main-langchain.vercel.app
langchainjs-docs-ruddy.vercel.app
js.langchain.com

Please sign in to comment.