Skip to content

Commit

Permalink
Merge branch 'main' into fix/max-call-stack-length
Browse files Browse the repository at this point in the history
  • Loading branch information
H4ad authored Aug 16, 2023
2 parents 98a91fe + 4f6b61f commit 855b7fa
Show file tree
Hide file tree
Showing 45 changed files with 940 additions and 169 deletions.
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
</h4>
<br />
<p align="center">
A resilient, innovative and open-source search experience to achieve <br />
A resilient, innovative and open-source full-text and vector search experience to achieve <br />
seamless integration with your infrastructure and data
</p>
<br />
Expand All @@ -29,10 +29,11 @@

If you need more info, help, or want to provide general feedback on Orama, join
the
[Orama Slack channel](https://join.slack.com/t/orama-community/shared_invite/zt-1gzvj0mmt-yJhJ6pnrSGuwqPmPx9uO5Q)
[Orama Slack channel](https://orama.to/slack)

# Highlighted features

- [Vector Search](https://docs.oramasearch.com/usage/search/vector-search)
- [Search filters](https://docs.oramasearch.com/usage/search/filters)
- [Facets](https://docs.oramasearch.com/usage/search/facets)
- [Fields Boosting](https://docs.oramasearch.com/usage/search/fields-boosting)
Expand Down Expand Up @@ -78,13 +79,14 @@ Orama is quite simple to use. The first thing to do is to create a new database
instance and set an indexing schema:

```js
import { create, insert, remove, search } from '@orama/orama'
import { create, insert, remove, search, searchVector } from '@orama/orama'

const db = await create({
schema: {
name: 'string',
description: 'string',
price: 'number',
embedding: 'vector[1536]', // Vector size must be expressed during schema initialization
meta: {
rating: 'number',
},
Expand All @@ -104,6 +106,7 @@ await insert(db, {
name: 'Wireless Headphones',
description: 'Experience immersive sound quality with these noise-cancelling wireless headphones.',
price: 99.99,
embedding: [...],
meta: {
rating: 4.5,
},
Expand All @@ -113,6 +116,7 @@ await insert(db, {
name: 'Smart LED Bulb',
description: 'Control the lighting in your home with this energy-efficient smart LED bulb, compatible with most smart home systems.',
price: 24.99,
embedding: [...],
meta: {
rating: 4.3,
},
Expand All @@ -122,6 +126,7 @@ await insert(db, {
name: 'Portable Charger',
description: 'Never run out of power on-the-go with this compact and fast-charging portable charger for your devices.',
price: 29.99,
embedding: [...],
meta: {
rating: 3.6,
},
Expand Down Expand Up @@ -198,6 +203,15 @@ Result:
}
```

If you want to perform a vector search, you can use the `searchVector` function:

```js
const searchResult = await searchVector(db, {
vector: [...], // OpenAI embedding or similar vector to be used as an input
property: 'embedding' // Property to search through. Mandatory for vector search
})
```

# Usage with CommonJS

Orama is packaged as ES modules, suitable for Node.js, Deno, Bun and modern browsers.
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "orama-monorepo",
"version": "1.1.0",
"description": "Next generation full-text search engine, written in TypeScript",
"version": "1.2.1",
"description": "Next generation full-text and vector search engine, written in TypeScript",
"workspaces": [
"packages/*"
],
Expand Down
2 changes: 1 addition & 1 deletion packages/benchmarks/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "benchmarks",
"version": "1.1.0",
"version": "1.2.1",
"private": true,
"scripts": {
"bench:group": "node src/group.bench.js",
Expand Down
2 changes: 1 addition & 1 deletion packages/docs/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@orama/docs",
"version": "1.1.0",
"version": "1.2.1",
"description": "Documentation for Orama",
"private": true,
"main": "index.js",
Expand Down
4 changes: 4 additions & 0 deletions packages/docs/pages/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
"type": "menu",
"title": "Search Features",
"items": {
"vector-search": {
"title": "Vector Search",
"href": "/usage/search/vector-search"
},
"typo-tolerance": {
"title": "Typo Tolerance",
"href": "/usage/search/introduction#typo-tolerance"
Expand Down
2 changes: 1 addition & 1 deletion packages/docs/pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import { AiFillFileAdd, AiOutlineSearch, AiFillDelete } from 'react-icons/ai'

# Getting Started with Orama

Orama is a fast, batteries-included, full-text search engine entirely written in TypeScript, with zero dependencies. <br /><br />
Orama is a fast, batteries-included, full-text and vector search engine entirely written in TypeScript, with zero dependencies. <br /><br />

<iframe
width="100%"
Expand Down
2 changes: 2 additions & 0 deletions packages/docs/pages/plugins/plugin-parsedoc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ An asynchronous function that takes three arguments:
- `globPath`: a string representing a glob path to reading the files from.
- `options`: an object containing the following properties:
- `transformFn` (optional): a function that passes an object as its only argument. It contains the raw HTML/Markdown chunk, tag name, parsed content and html attributes.
If the function adds a `additionalProperties` object to the trasformed node, it will be merged with the original node's properties.
- `mergeStrategy` (optional): a value that defines how to handle consecutive chunks of the same tag. The default value is `merge`. Accepted values are:
- `merge`: consecutive chunks with the same tag will be merged into one document for the index.
- `split`: consecutive chunks with the same tag will be split into separate documents for the index.
Expand All @@ -67,6 +68,7 @@ A asynchronous function that takes three arguments. Should be used internally by
- `fileType`: a string representing the file type. Accepted values are `html` and `md`.
- `options`: an object containing the following properties:
- `transformFn` (optional): a function that passes an object as its only argument. It contains the raw HTML/Markdown chunk, tag name, parsed content and html attributes.
If the function adds a `additionalProperties` object to the trasformed node, it will be merged with the original node's properties.
- `mergeStrategy` (optional): a value that defines how to handle consecutive chunks of the same tag. The default value is `merge`. Accepted values are:
- `merge`: consecutive chunks with the same tag will be merged into one document for the index.
- `split`: consecutive chunks with the same tag will be split into separate documents for the index.
Expand Down
40 changes: 32 additions & 8 deletions packages/docs/pages/usage/create.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,15 @@ If you want to learn more and see real-world examples, check out [this blog post
The `schema` is an object where the keys are the property names and the values are the property types. \
Orama supports the following types:

| Type | Description | example |
| ----------- | --------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `string` | A string of characters. | `'Hello world'` |
| `number` | A numeric value, either float or integer. | `42` |
| `boolean` | A boolean value. | `true` |
| `string[]` | An array of strings. | `['red', 'green', 'blue']` |
| `number[]` | An array of numbers. | `[42, 91, 28.5]` |
| `boolean[]` | An array of booleans. | `[true, false, false]` |
| Type | Description | example |
| ---------------- | --------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `string` | A string of characters. | `'Hello world'` |
| `number` | A numeric value, either float or integer. | `42` |
| `boolean` | A boolean value. | `true` |
| `string[]` | An array of strings. | `['red', 'green', 'blue']` |
| `number[]` | An array of numbers. | `[42, 91, 28.5]` |
| `boolean[]` | An array of booleans. | `[true, false, false]` |
| `vector[<size>]` | A vector of numbers to perform vector search on. | `[0.403, 0.192, 0.830]` |

A database can be as simple as:

Expand Down Expand Up @@ -75,6 +76,29 @@ const movieDB = await create({
})
```

## Vector properties

Since version `1.2.0`, Orama supports vector search. \
To run vector queries, you first need to initialize a vector property in the schema:

```javascript copy
const db = await create({
schema: {
title: 'string',
embedding: 'vector[384]',
}
})
```

Please note that the size of the vector **must** be specified in the schema. \
The size of the vector is the number of elements that the vector contains, so make sure to specify the correct size, as performing search on vectors of different sizes will result in unpredictable and mostly wrong results.

If you're using vector properties to search through embeddings, we highly recommend using [HuggingFace's](https://huggingface.co/) `gte-small` model, which has a vector size of `384`.

There is a great article written by Supabase explaining why it might be a better option than OpenAI's `text-embedding-ada-002` model: [https://supabase.com/blog/fewer-dimensions-are-better-pgvector](https://supabase.com/blog/fewer-dimensions-are-better-pgvector).

For performance reasons, we recommend using one vector property per database, even though it's possible to have multiple vector properties in the same Orama instance.

## Instance ID

Every Orama instance has a unique `id` property, which can be used to identify a given instance when working with multiple databases.
Expand Down
1 change: 1 addition & 0 deletions packages/docs/pages/usage/search/_meta.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"introduction": "Searching with Orama",
"vector-search": "Vector Search",
"fields-boosting": "Fields boosting",
"facets": "Facets",
"filters": "Filters",
Expand Down
73 changes: 73 additions & 0 deletions packages/docs/pages/usage/search/vector-search.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import { Callout } from 'nextra-theme-docs'

# Vector Search

Since `v1.2.0`, Orama supports **vector search** natively 🎉.

To perform search through vectors, you need to correctly configure your Orama schema, as described in the [create page](/usage/create).

## Performing Vector Search

To perform vector search, you will need to use a new method called `searchVector`, which can be imported from `@orama/orama`:

```js copy
import { searchVector } from '@orama/orama'
```

The APIs are very similar to the ones you already know, but with a few differences:

1. Instead of searching for a `term`, you will need to provide a `vector` to search for.
2. You will need to specify the vector property you want to search on.
3. At the time of writing, you can only search through one vector property at a time. If you think that this is too limiting, please open a [feature request](https://github.com/oramasearch/orama/issues/new?assignees=&labels=&projects=&template=feature_request.md&title=) to support multiple vector properties at search-time.

Let's see a full example of how to perform vector search:

```js copy
import { create, insertMultiple, searchVector } from '@orama/orama'

const db = await create({
schema: {
title: 'string', // To make it simple, let's pretend that
embedding: 'vector[5]', // we are using a 5-dimensional vector.
}
})

await insertMultiple(db, [
{ title: 'The Prestige', embedding: [0.938293, 0.284951, 0.348264, 0.948276, 0.564720] },
{ title: 'Barbie', embedding: [0.192839, 0.028471, 0.284738, 0.937463, 0.092827] },
{ title: 'Oppenheimer', embedding: [0.827391, 0.927381, 0.001982, 0.983821, 0.294841] },
])

const results = await searchVector(db, {
vector: [0.938292, 0.284961, 0.248264, 0.748276, 0.264720],
property: 'embedding',
similarity: 0.8, // Minimum similarity. Defaults to `0.8`
includeVectors: true, // Defaults to `false`
limit: 10, // Defaults to `10`
offset: 0, // Defaults to `0`
})
```

The returning object will be exactly the same as the one we would expect from the default `search` method:

```js
{
count: 1,
elapsed: {
raw: 25000,
formatted: '25ms',
},
hits: [
{
id: '1-19238',
score: 0.812383129,
document: {
title: 'The Prestige',
embedding: [0.938293, 0.284951, 0.348264, 0.948276, 0.564720],
}
}
]
}
```

Since vectors can be quite large, you can also choose to not include them in the response by setting `includeVectors` to `false` (default behavior).
22 changes: 17 additions & 5 deletions packages/orama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ If you need more info, help, or want to provide general feedback on Orama, join

# Highlighted features

- [Vector Search](https://docs.oramasearch.com/usage/search/vector-search)
- [Search filters](https://docs.oramasearch.com/usage/search/filters)
- [Facets](https://docs.oramasearch.com/usage/search/facets)
- [Fields Boosting](https://docs.oramasearch.com/usage/search/fields-boosting)
Expand Down Expand Up @@ -58,13 +59,14 @@ Orama is quite simple to use. The first thing to do is to create a new database
instance and set an indexing schema:

```js
import { create, insert, remove, search } from '@orama/orama'
import { create, insert, remove, search, searchVector } from '@orama/orama'

const db = await create({
schema: {
name: 'string',
description: 'string',
price: 'number',
embedding: 'vector[1536]', // Vector size must be expressed during schema initialization
meta: {
rating: 'number',
},
Expand All @@ -84,26 +86,27 @@ await insert(db, {
name: 'Wireless Headphones',
description: 'Experience immersive sound quality with these noise-cancelling wireless headphones.',
price: 99.99,
embedding: [...],
meta: {
rating: 4.5,
},
})

await insert(db, {
name: 'Smart LED Bulb',
description:
'Control the lighting in your home with this energy-efficient smart LED bulb, compatible with most smart home systems.',
description: 'Control the lighting in your home with this energy-efficient smart LED bulb, compatible with most smart home systems.',
price: 24.99,
embedding: [...],
meta: {
rating: 4.3,
},
})

await insert(db, {
name: 'Portable Charger',
description:
'Never run out of power on-the-go with this compact and fast-charging portable charger for your devices.',
description: 'Never run out of power on-the-go with this compact and fast-charging portable charger for your devices.',
price: 29.99,
embedding: [...],
meta: {
rating: 3.6,
},
Expand Down Expand Up @@ -180,6 +183,15 @@ Result:
}
```

If you want to perform a vector search, you can use the `searchVector` function:

```js
const searchResult = await searchVector(db, {
vector: [...], // OpenAI embedding or similar vector to be used as an input
property: 'embedding' // Property to search through. Mandatory for vector search
})
```

# Usage with CommonJS

Orama is packaged as ES modules, suitable for Node.js, Deno, Bun and modern browsers.
Expand Down
12 changes: 10 additions & 2 deletions packages/orama/package.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"name": "@orama/orama",
"version": "1.1.0",
"version": "1.2.1",
"type": "module",
"description": "Next generation full-text search engine, written in TypeScript",
"description": "Next generation full-text and vector search engine, written in TypeScript",
"sideEffects": false,
"main": "./dist/cjs/index.cjs",
"exports": {
Expand Down Expand Up @@ -46,6 +46,9 @@
},
"keywords": [
"full-text search",
"vector search",
"vector database",
"vectors",
"search",
"fuzzy search",
"typo-tolerant search",
Expand All @@ -58,6 +61,11 @@
"author": true
},
"contributors": [
{
"name": "Tommaso Allevi",
"email": "tommaso.allevi@oramasearch.com",
"url": "https://github.com/allevo"
},
{
"name": "Paolo Insogna",
"email": "paolo.insogna@oramasearch.com",
Expand Down
Loading

0 comments on commit 855b7fa

Please sign in to comment.