Adds vector search capabilities to Orama #462

micheleriva · 2023-08-03T21:27:49Z

This PR introduces vector search capabilities in Orama. This will be part of Orama v1.2.0, the next major release.

New Vector API

Orama adds support for Vector search by adding a new datatype: vector[<size>]:

import { create } from '@orama/orama'

const db = await create({
  schema: {
    text: 'string',
    embedding: 'vector[1536]' // <--- vector size is mandatory here. OpenAI embeddings, for instance, have 1536 dimensions. 
  }
})

After you create your Orama instance, you can insert and search for vectors using the new searchVector function:

import { create, insert, searchVector } from '@orama/orama'

const db = await create({
  schema: {
    text: 'string',
    myVector: 'vector[5]'
  }
})

await insert(db, { text: 'foo', myVector: [1, 0, 0, 0, 0] })
await insert(db, { text: 'bar', myVector: [0, 0, 0, 0, 0] })
await insert(db, { text: 'baz', myVector: [1, 1, 1, 1, 1] })

const results = await searchVector(db, {
  vector: [1, 0, 0, 0, 0], // Your input vector
  property: 'myVector' // Property to search through is mandatory with the "searchVector" function
})

…zation

vercel · 2023-08-03T21:27:53Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
orama-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 4, 2023 9:38pm

gustavopch · 2023-08-05T14:49:01Z

Awesome addition! One question: have you tested the performance? Like, how much time would it take to search among 30k vectors, 100k vectors, etc.? 🤔

I was trying to write a vector search implementation this week and it was taking about 80ms to perform a search among 30k vectors (1536 dimensions) on my MacBook M1. I was also indexing the vectors with their precalculated magnitudes like you did. That performance worried me because in production it would make the server stuck, unable to process anything else while performing a search (and would probably be slower than on my M1).

I believe one way to improve the performance would be to reduce the number of dimensions. I was looking for how to do that and found this Gist: https://gist.github.com/sepans/419d413f786b27872b34. Not sure how much impact it has on the quality of the search though, but I'm leaving the link here so you can check if you want.

micheleriva · 2023-08-05T15:19:16Z

Hey @gustavopch, thank you so much!

I share your concerns; I made some tests locally and found no significant performance problems. I am sure though that we will find some edge cases and will eventually implement something similar to the gist you shared if we see that it's becoming more and more challenging to scale performances.

With that being said, I'd like to highlight a couple of things:

You may not need OpenAI embeddings (since you use 1536 dimensions, I guess you're talking about them). Let me share this great article by Supabase: https://supabase.com/blog/fewer-dimensions-are-better-pgvector
Cosine similarity is incredibly easy to parallelize. We plan to write some platform-specific plugins (Node.js, browsers, etc.) to parallelize execution on large vector sets.

Hope that helps :)

micheleriva added 2 commits August 3, 2023 12:19

feat(orama): adds vector search

b33aaac

feat(orama, plugin-data-persistence): adds support for vector seriali…

7a68ed1

…zation

docs: updates docs

69df477

vercel bot deployed to Preview August 3, 2023 21:47 View deployment

orama(vectors): improved tests

301045b

vercel bot deployed to Preview August 4, 2023 21:05 View deployment

docs: adds documentation on vector search

d349480

vercel bot deployed to Preview August 4, 2023 21:38 View deployment

micheleriva merged commit 5524c0c into main Aug 4, 2023
2 checks passed

micheleriva deleted the feat/vectors branch August 4, 2023 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds vector search capabilities to Orama #462

Adds vector search capabilities to Orama #462

micheleriva commented Aug 3, 2023

vercel bot commented Aug 3, 2023 •

edited

Loading

gustavopch commented Aug 5, 2023 •

edited

Loading

micheleriva commented Aug 5, 2023

Adds vector search capabilities to Orama #462

Adds vector search capabilities to Orama #462

Conversation

micheleriva commented Aug 3, 2023

New Vector API

vercel bot commented Aug 3, 2023 • edited Loading

gustavopch commented Aug 5, 2023 • edited Loading

micheleriva commented Aug 5, 2023

vercel bot commented Aug 3, 2023 •

edited

Loading

gustavopch commented Aug 5, 2023 •

edited

Loading