Skip to content

Commit

Permalink
docs: adds documentation on vector search
Browse files Browse the repository at this point in the history
  • Loading branch information
micheleriva committed Aug 4, 2023
1 parent 301045b commit d349480
Show file tree
Hide file tree
Showing 7 changed files with 112 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ the

# Highlighted features

- [Vector Search](https://docs.oramasearch.com/usage/search/vectors)
- [Vector Search](https://docs.oramasearch.com/usage/search/vector-search)
- [Search filters](https://docs.oramasearch.com/usage/search/filters)
- [Facets](https://docs.oramasearch.com/usage/search/facets)
- [Fields Boosting](https://docs.oramasearch.com/usage/search/fields-boosting)
Expand Down
4 changes: 4 additions & 0 deletions packages/docs/pages/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
"type": "menu",
"title": "Search Features",
"items": {
"vector-search": {
"title": "Vector Search",
"href": "/usage/search/vector-search"
},
"typo-tolerance": {
"title": "Typo Tolerance",
"href": "/usage/search/introduction#typo-tolerance"
Expand Down
40 changes: 32 additions & 8 deletions packages/docs/pages/usage/create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,15 @@ If you want to learn more and see real-world examples, check out [this blog post
The `schema` is an object where the keys are the property names and the values are the property types. \
Orama supports the following types:

| Type | Description | example |
| ----------- | --------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `string` | A string of characters. | `'Hello world'` |
| `number` | A numeric value, either float or integer. | `42` |
| `boolean` | A boolean value. | `true` |
| `string[]` | An array of strings. | `['red', 'green', 'blue']` |
| `number[]` | An array of numbers. | `[42, 91, 28.5]` |
| `boolean[]` | An array of booleans. | `[true, false, false]` |
| Type | Description | example |
| ---------------- | --------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `string` | A string of characters. | `'Hello world'` |
| `number` | A numeric value, either float or integer. | `42` |
| `boolean` | A boolean value. | `true` |
| `string[]` | An array of strings. | `['red', 'green', 'blue']` |
| `number[]` | An array of numbers. | `[42, 91, 28.5]` |
| `boolean[]` | An array of booleans. | `[true, false, false]` |
| `vector[<size>]` | A vector of numbers to perform vector search on. | `[0.403, 0.192, 0.830]` |

A database can be as simple as:

Expand Down Expand Up @@ -75,6 +76,29 @@ const movieDB = await create({
})
```

## Vector properties

Since version `1.2.0`, Orama supports vector search. \
To run vector queries, you first need to initialize a vector property in the schema:

```javascript copy
const db = await create({
schema: {
title: 'string',
embedding: 'vector[384]',
}
})
```

Please note that the size of the vector **must** be specified in the schema. \
The size of the vector is the number of elements that the vector contains, so make sure to specify the correct size, as performing search on vectors of different sizes will result in unpredictable and mostly wrong results.

If you're using vector properties to search through embeddings, we highly recommend using [HuggingFace's](https://huggingface.co/) `gte-small` model, which has a vector size of `384`.

There is a great article written by Supabase explaining why it might be a better option than OpenAI's `text-embedding-ada-002` model: [https://supabase.com/blog/fewer-dimensions-are-better-pgvector](https://supabase.com/blog/fewer-dimensions-are-better-pgvector).

For performance reasons, we recommend using one vector property per database, even though it's possible to have multiple vector properties in the same Orama instance.

## Instance ID

Every Orama instance has a unique `id` property, which can be used to identify a given instance when working with multiple databases.
Expand Down
1 change: 1 addition & 0 deletions packages/docs/pages/usage/search/_meta.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"introduction": "Searching with Orama",
"vector-search": "Vector Search",
"fields-boosting": "Fields boosting",
"facets": "Facets",
"filters": "Filters",
Expand Down
73 changes: 73 additions & 0 deletions packages/docs/pages/usage/search/vector-search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import { Callout } from 'nextra-theme-docs'

# Vector Search

Since `v1.2.0`, Orama supports **vector search** natively 🎉.

To perform search through vectors, you need to correctly configure your Orama schema, as described in the [create page](/usage/create).

## Performing Vector Search

To perform vector search, you will need to use a new method called `searchVector`, which can be imported from `@orama/orama`:

```js copy
import { searchVector } from '@orama/orama'
```

The APIs are very similar to the ones you already know, but with a few differences:

1. Instead of searching for a `term`, you will need to provide a `vector` to search for.
2. You will need to specify the vector property you want to search on.
3. At the time of writing, you can only search through one vector property at a time. If you think that this is too limiting, please open a [feature request](https://github.com/oramasearch/orama/issues/new?assignees=&labels=&projects=&template=feature_request.md&title=) to support multiple vector properties at search-time.

Let's see a full example of how to perform vector search:

```js copy
import { create, insertMultiple, searchVector } from '@orama/orama'

const db = await create({
schema: {
title: 'string', // To make it simple, let's pretend that
embedding: 'vector[5]', // we are using a 5-dimensional vector.
}
})

await insertMultiple(db, [
{ title: 'The Prestige', embedding: [0.938293, 0.284951, 0.348264, 0.948276, 0.564720] },
{ title: 'Barbie', embedding: [0.192839, 0.028471, 0.284738, 0.937463, 0.092827] },
{ title: 'Oppenheimer', embedding: [0.827391, 0.927381, 0.001982, 0.983821, 0.294841] },
])

const results = await searchVector(db, {
vector: [0.938292, 0.284961, 0.248264, 0.748276, 0.264720],
property: 'embedding',
similarity: 0.8, // Minimum similarity. Defaults to `0.8`
includeVectors: true, // Defaults to `false`
limit: 10, // Defaults to `10`
offset: 0, // Defaults to `0`
})
```

The returning object will be exactly the same as the one we would expect from the default `search` method:

```js
{
count: 1,
elapsed: {
raw: 25000,
formatted: '25ms',
},
hits: [
{
id: '1-19238',
score: 0.812383129,
document: {
title: 'The Prestige',
embedding: [0.938293, 0.284951, 0.348264, 0.948276, 0.564720],
}
}
]
}
```

Since vectors can be quite large, you can also choose to not include them in the response by setting `includeVectors` to `false` (default behavior).
2 changes: 1 addition & 1 deletion packages/orama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ If you need more info, help, or want to provide general feedback on Orama, join

# Highlighted features

- [Vector Search](https://docs.oramasearch.com/usage/search/vectors)
- [Vector Search](https://docs.oramasearch.com/usage/search/vector-search)
- [Search filters](https://docs.oramasearch.com/usage/search/filters)
- [Facets](https://docs.oramasearch.com/usage/search/facets)
- [Fields Boosting](https://docs.oramasearch.com/usage/search/fields-boosting)
Expand Down
1 change: 0 additions & 1 deletion packages/orama/src/methods/search-vector.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ export async function searchVector(orama: Orama, params: SearchVectorParams): Pr
const doc = (orama.data.docs as any).docs[originalID]

if (doc) {
// TODO: manage multiple vector properties
if (!includeVectors) {
doc[property] = null
}
Expand Down

0 comments on commit d349480

Please sign in to comment.