Skip to content

Commit

Permalink
Update AI-powered search tutorial (#3056)
Browse files Browse the repository at this point in the history
  • Loading branch information
guimachiavelli authored Jan 15, 2025
1 parent a3a155b commit 4e903b2
Show file tree
Hide file tree
Showing 6 changed files with 138 additions and 47 deletions.
2 changes: 1 addition & 1 deletion guides/embedders/cloudflare.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ In this configuration:
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
- `apiKey`: Replace `<API Key>` with your actual Cloudflare API key.
- `dimensions`: Specifies the dimensions of the embeddings. Set to 384 for `baai/bge-small-en-v1.5`, 768 for `baai/bge-base-en-v1.5`, or 1024 for `baai/bge-large-en-v1.5`.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=cloudflare-embeddings-guide#documenttemplate) for generating embeddings from your documents.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
- `url`: Specifies the URL of the Cloudflare Worker AI API endpoint.
- `request`: Defines the request structure for the Cloudflare Worker AI API, including the input parameters.
- `response`: Defines the expected response structure from the Cloudflare Worker AI API, including the embedding data.
Expand Down
2 changes: 1 addition & 1 deletion guides/embedders/cohere.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ In this configuration:
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
- `apiKey`: Replace `<Cohere API Key>` with your actual Cohere API key.
- `dimensions`: Specifies the dimensions of the embeddings, set to 1024 for the `embed-english-v3.0` model.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=cohere-embeddings-guide#documenttemplate) for generating embeddings from your documents.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
- `url`: Specifies the URL of the Cohere API endpoint.
- `request`: Defines the request structure for the Cohere API, including the model name and input parameters.
- `response`: Defines the expected response structure from the Cohere API, including the embedding data.
Expand Down
2 changes: 1 addition & 1 deletion guides/embedders/mistral.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ In this configuration:
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
- `apiKey`: Replace `<Mistral API Key>` with your actual Mistral API key.
- `dimensions`: Specifies the dimensions of the embeddings, set to 1024 for the `mistral-embed` model.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=mistral-embeddings-guide#documenttemplate) for generating embeddings from your documents.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
- `url`: Specifies the URL of the Mistral API endpoint.
- `request`: Defines the request structure for the Mistral API, including the model name and input parameters.
- `response`: Defines the expected response structure from the Mistral API, including the embedding data.
Expand Down
2 changes: 1 addition & 1 deletion guides/embedders/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ In this configuration:
- `source`: Specifies the source of the embedder, which is set to "openAi" for using OpenAI's API.
- `apiKey`: Replace `<OpenAI API Key>` with your actual OpenAI API key.
- `dimensions`: Specifies the dimensions of the embeddings. Set to 1536 for `text-embedding-3-small` and `text-embedding-ada-002`, or 3072 for `text-embedding-3-large`.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=openai-embeddings-guide#documenttemplate) for generating embeddings from your documents.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
- `model`: Specifies the OpenAI model to use for generating embeddings. Choose from `text-embedding-3-large`, `text-embedding-3-small`, or `text-embedding-ada-002`.

Once you've configured the embedder settings, Meilisearch will automatically generate embeddings for your documents and store them in the vector store.
Expand Down
4 changes: 2 additions & 2 deletions guides/embedders/voyage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ In this configuration:
- `source`: Specifies the source of the embedder, which is set to "rest" for using a REST API.
- `apiKey`: Replace `<Voyage AI API Key>` with your actual Voyage AI API key.
- `dimensions`: Specifies the dimensions of the embeddings. Set to 1024 for `voyage-2`, `voyage-large-2-instruct`, and `voyage-multilingual-2`, or 1536 for `voyage-large-2`.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search?utm_campaign=vector-search&utm_source=docs&utm_medium=voyage-embeddings-guide#documenttemplate) for generating embeddings from your documents.
- `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents.
- `url`: Specifies the URL of the Voyage AI API endpoint.
- `request`: Defines the request structure for the Voyage AI API, including the model name and input parameters.
- `response`: Defines the expected response structure from the Voyage AI API, including the embedding data.
Expand All @@ -68,7 +68,7 @@ Once you've configured the embedder settings, Meilisearch will automatically gen

Please note that most third-party tools have rate limiting, which is managed by Meilisearch. If you have a free account, the indexation process may take some time, but Meilisearch will handle it with a retry strategy.

It's recommended to monitor the tasks queue to ensure everything is running smoothly. You can access the tasks queue using the Cloud UI or the [Meilisearch API](/reference/api/tasks?utm_campaign=vector-search&utm_source=docs&utm_medium=voyage-embeddings-guide#get-tasks).
It's recommended to monitor the tasks queue to ensure everything is running smoothly. You can access the tasks queue using the Cloud UI or the [Meilisearch API](/reference/api/tasks).

## Testing semantic search

Expand Down
173 changes: 132 additions & 41 deletions learn/ai_powered_search/getting_started_with_ai_search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ description: AI-powered search is an experimental technology that uses LLMs to r

# Getting started with AI-powered search <NoticeTag type="experimental" label="experimental" />

[AI-powered search](https://meilisearch.com/solutions/vector-search?utm_campaign=vector-search&utm_source=docs&utm_content=getting-started-with-ai-search), sometimes also called vector search and hybrid search, is an experimental technology that uses [large language models](https://en.wikipedia.org/wiki/Large_language_model) to retrieve search results based on the meaning and context of a query.
[AI-powered search](https://meilisearch.com/solutions/vector-search), sometimes also called vector search or hybrid search, is an experimental technology that uses [large language models (LLMs)](https://en.wikipedia.org/wiki/Large_language_model) to retrieve search results based on the meaning and context of a query.

This tutorial will walk you through configuring AI-powered search in your Meilisearch project. You will activate the vector store setting, generate document embeddings with OpenAI, and perform your first search.
This tutorial will walk you through configuring AI-powered search in your Meilisearch project. You will see how to activate this feature, generate document embeddings with OpenAI, and perform your first search.

## Requirements

Expand All @@ -17,90 +17,181 @@ This tutorial will walk you through configuring AI-powered search in your Meilis

## Create a new index

Create a `kitchenware` index and add [this kitchenware products dataset](/assets/datasets/kitchenware.json) to it. If necessary, consult the quick start for instructions on how to configure a basic Meilisearch installation.
First, create a new Meilisearch project. If this is your first time using Meilisearch, follow the [quick start](/learn/getting_started/cloud_quick_start) then come back to this tutorial.

Next, create a `kitchenware` index and add [this kitchenware products dataset](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/datasets/kitchenware.json) to it. It will take Meilisearch a few moments to process your request, but you can continue to the next step while your data is indexing.

## Activate AI-powered search

First, activate the AI-powered search experimental feature. Exactly how to do that depends on whether you are using [Meilisearch Cloud](#meilisearch-cloud-projects) or [self-hosting Meilisearch](#self-hosted-instances).
AI-powered search is an experimental feature and is disabled by default. You must manually activate it either via the Meilisearch Cloud UI, or with the experimental features endpoint.

### Meilisearch Cloud projects
<Capsule intent="tip" title="Meilisearch Cloud AI-powered search waitlist">
To use AI-powered search with Meilisearch Cloud, you must first enter the waitlist. You will not be able to activate vector search until your sign-up has been approved.
</Capsule>

If using Meilisearch Cloud, navigate to your project overview and find "Experimental features". Then check the "AI-powered search" box.
### Meilisearch Cloud UI

![A section of the project overview interface titled "Experimental features". The image shows a few options, including "Vector store".](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)
Navigate to your project overview and find "Experimental features". Then click on the "AI-powered search" box.

<Capsule intent="note" title="Meilisearch Cloud AI-powered search waitlist">
To ensure proper scaling of Meilisearch Cloud's latest AI-powered search offering, you must enter the waitlist before activating vector search. You will not be able to activate vector search in the Cloud interface or via the `/experimental-features` route until your sign-up has been approved.
</Capsule>
![A section of the project overview interface titled "Experimental features". The image shows a few options, including "Vector store".](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)

### Self-hosted instances
### Experimental features endpoint

Use [the `/experimental-features` route](/reference/api/experimental_features?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to activate vector search during runtime:
Use [the `/experimental-features` route](/reference/api/experimental_features) to activate vector search during runtime:

```sh
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"vectorStore": true
}'
```

## Generate vector embeddings with OpenAI
Replace `MEILISEARCH_URL` with your project's URL. In most cases, this should look like `https://ms-000xx00x000-xx.xxx.meilisearch.io` if you're using Meilisearch Cloud, or `http://localhost:7700` if you are running Meilisearch in your local machine.

## Generate embeddings with OpenAI

In this step, you will configure an OpenAI embedder. Meilisearch uses **embedders** to translate documents into **embeddings**, which are mathematical representations of a document's meaning and context.

Open a blank file in your text editor. You will only use this file to build your embedder one step at a time, so there's no need to save it if you plan to finish the tutorial in one sitting.

### Choose an embedder name

In your blank file, create your `embedder` object:

```json
{
"products-openai": {}
}
```

`products-openai` is the name of your embedder for this tutorial. You can name embedders any way you want, but try to keep it simple, short, and easy to remember.

### Choose an embedder source

Meilisearch relies on third-party services to generate embeddings. These services are often referred to as the embedder source.

Add a new `source` field to your embedder object:

```json
{
"products-openai": {
"source": "openai"
}
}
```

Meilisearch supports several embedder sources. This tutorial uses OpenAI because it is a good option that fits most use cases.

### Choose an embedder model

Models supply the information required for embedders to process your documents.

Add a new `model` field to your embedder object:

```json
{
"products-openai": {
"source": "openai",
"model": "text-embedding-3-small"
}
}
```

Next, you must generate vector embeddings for all documents in your dataset. Embeddings are mathematical representations of the meanings of words and sentences in your documents. Meilisearch relies on external providers to generate these embeddings. This tutorial uses an OpenAI embedder, but Meilisearch also supports embedders from HuggingFace, Ollama, and any embedder accessible via a RESTful API.
Each embedder service supports different models targeting specific use cases. `text-embedding-3-small` is a cost-effective model for general usage.

Use the `embedders` index setting of the [update `/settings` endpoint](/reference/api/settings?utm_campaign=vector-search&utm_source=docs&utm_medium=vector-search-guide) to configure an [OpenAI](https://platform.openai.com/) embedder:
### Create your API key

Log into OpenAI, or create an account if this is your first time using it. Generate a new API key using [OpenAI's web interface](https://platform.openai.com/api-keys).

Add the `apiKey` field to your embedder:

```json
{
"products-openai": {
"source": "openai",
"model": "text-embedding-3-small",
"apiKey": "OPEN_AI_API_KEY",
}
}
```

Replace `OPEN_AI_API_KEY` with your own API key.

<Capsule intent="tip" title="OpenAI key tiers">
You may use any key tier for this tutorial. Use at least [Tier 2 keys](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two) in production environments.
</Capsule>

### Design a prompt template

Meilisearch embedders only accept textual input, but documents can be complex objects containing different types of data. This means you must convert your documents into a single text field. Meilisearch uses [Liquid](https://shopify.github.io/liquid/basics/introduction/), an open-source templating language to help you do that.

A good template should be short and only include the most important information about a document. Add the following `documentTemplate` to your embedder:

```json
{
"products-openai": {
"source": "openai",
"model": "text-embedding-3-small",
"apiKey": "OPEN_AI_API_KEY",
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
}
}
```

This template starts by giving the general context of the document: `An object used in a kitchen`. Then it adds the information that is specific to each document: `doc` represents your document, and you can access any of its attributes using dot notation. `name` is an attribute with values such as `wooden spoon` or `rolling pin`. Since it is present in all documents in this dataset and describes the product in few words, it is a good choice to include in the template.

### Create the embedder

Your embedder object is ready. Send it to Meilisearch by updating your index settings:

```sh
curl \
-X PATCH 'http://localhost:7700/indexes/kitchenware/settings' \
-X PATCH 'MEILISEARCH_URL/indexes/kitchenware/settings/embedders' \
-H 'Content-Type: application/json' \
--data-binary '{
"embedders": {
"openai": {
"source": "openAi",
"apiKey": "OPEN_AI_API_KEY",
"model": "text-embedding-3-small",
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
}
"products-openai": {
"source": "openAi",
"apiKey": "OPEN_AI_API_KEY",
"model": "text-embedding-3-small",
"documentTemplate": "An object used in a kitchen named '{{doc.name}}'"
}
}'
```

Replace `OPEN_AI_API_KEY` with your [OpenAI API key](https://platform.openai.com/api-keys). You may use any key tier for this tutorial, but prefer [Tier 2 keys](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-two) for optimal performance in production environments.

### `documentTemplate`

`documentTemplate` describes a short [Liquid template](https://shopify.github.io/liquid/). The text inside curly brackets (`{{`) indicates a document field in dot notation, where `doc` indicates the document itself and the string that comes after the dot indicates a document attribute. Meilisearch replaces these brackets and their contents with the corresponding field value.

The resulting text is the prompt OpenAI uses to generate document embeddings.
Replace `MEILISEARCH_URL` with the address of your Meilisearch project, and `OPEN_AI_API_KEY` with your [OpenAI API key](https://platform.openai.com/api-keys).

For example, kitchenware documents have three fields: `id`, `name`, and `price`. If your `documentTemplate` is `"An object used in a kitchen named '{{doc.name}}'"`, the text Meilisearch will send to the embedder when indexing the first document is `"An object used in a kitchen named 'Wooden spoon'"`.

For the best results, always provide a `documentTemplate`. Keep your templates short and only include highly relevant information. This ensures optimal indexing performance and search result relevancy.
Meilisearch and OpenAI will start processing your documents and updating your index. This may take a few moments, but once it's done you are ready to perform an AI-powered search.

## Perform an AI-powered search

Perform AI-powered searches with `q` and `hybrid` to retrieve search results using the default embedder you configured in the previous step:
AI-powered searches are very similar to basic text searches. You must query the `/search` endpoint with a request containing both the `q` and the `hybrid` parameters:

```sh
curl \
-X POST 'http://localhost:7700/indexes/kitchenware/search' \
-X POST 'MEILISEARCH_URL/indexes/kitchenware/search' \
-H 'content-type: application/json' \
--data-binary '{
"q": "kitchen utensils made of wood",
"hybrid": {
"embedder": "openai",
"semanticRatio": 0.7
"embedder": "products-openai"
}
}'
```

Meilisearch will return a mix of semantic and full-text matches, prioritizing results that match the query's meaning and context. If you want Meilisearch to return more results based on the meaning and context of a search, set `semanticRatio` to a value greater than `0.5`. Setting `semanticRatio` to a value lower than `0.5`, instead, will return more full-text matches.
For this tutorial, `hybrid` is an object with a single `embedder` field.

Meilisearch will then return an equal mix of semantic and full-text matches.

## Conclusion

You have seen how to set up and perform AI-powered searches with Meilisearch and OpenAI. For more in-depth information, consult the reference for embedders and the `hybrid` search parameter.
Congratulations! You have created an index, added a small dataset to it, and activated AI-powered search. You then used OpenAI to generate embeddings out of your documents, and performed your first AI-powered search.

## Next steps

Now you have a basic overview of the basic steps required for setting up and performing AI-powered searches, you might want to try and implement this feature in your own application.

For practical information on implementing AI-powered search with other services, consult our [guides section](/guides/ai/openai). There you will find specific instructions for embedders such as [LangChain](/guides/ai/langchain) and [Cloudflare](/guides/ai/cloudflare).

AI-powered search is an experimental Meilisearch feature and is undergoing active development—[join the discussion on GitHub](https://github.com/orgs/meilisearch/discussions/677).
For more in-depth information, consult the API reference for [embedder settings](/reference/api/settings#embedders-experimental) and [the `hybrid` search parameter](/reference/api/search#hybrid-search-experimental).

0 comments on commit 4e903b2

Please sign in to comment.