Skip to content

Commit

Permalink
Merge branch 'main' into feat/binary-vector
Browse files Browse the repository at this point in the history
  • Loading branch information
VoVAllen authored Feb 27, 2024
2 parents 8170e61 + 35fe2ee commit 8c4a00c
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 9 deletions.
20 changes: 12 additions & 8 deletions .vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ export default defineConfig({
nav: [
{ text: 'Home', link: '/' },
{ text: 'Docs', link: '/getting-started/overview' },
{text: 'Tutorial', link: '/tutorial/'},
{ text: 'Blog', link: 'https://blog.pgvecto.rs' },
],

Expand Down Expand Up @@ -70,14 +71,6 @@ export default defineConfig({
{ text: 'Compatibility', link: '/usage/compatibility' },
]
},
{
text: 'Use Cases',
collapsed: false,
items: [
{ text: 'Image search', link: '/use-cases/image-search' },
{ text: 'Hybrid search', link: '/use-cases/hybrid-search' },
],
},
{
text: 'Integration',
collapsed: false,
Expand Down Expand Up @@ -131,6 +124,17 @@ export default defineConfig({
],
},
],
'/tutorial/': [
{
text: 'Use Cases',
collapsed: false,
items: [
{ text: 'Hybrid Search', link: '/tutorial/hybrid-search' },
{ text: 'Image Search', link: '/tutorial/image-search' },
{ text: 'Multi Tenancy', link: '/tutorial/multi-tenancy' },
],
},
]
},

socialLinks: [
Expand Down
2 changes: 1 addition & 1 deletion src/developers/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
6. Install cargo-pgrx.

```sh
cargo install cargo-pgrx@$(grep 'pgrx = {' Cargo.toml | cut -d '"' -f 2)
cargo install cargo-pgrx@$(grep 'pgrx = { version' Cargo.toml | cut -d '"' -f 2)
cargo pgrx init
```

Expand Down
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions src/tutorial/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Tutorial

Explore the use cases of pgvecto.rs with the tutorials.
59 changes: 59 additions & 0 deletions src/tutorial/multi-tenancy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Multi Tenancy

Multi-tenancy is essential for SaaS applications, allowing a single instance to serve multiple users or groups while ensuring data privacy and security. This blog outlines the design and implementation a simple multi-tenant vector search system with pgvecto.rs.

## Designing the Database Schema
The success of user-based isolation in multi-tenant systems hinges on a well-structured database schema:

### Users and Documents
Each user is identified by a unique user_id, which directly links them to their documents for private access. The documents table contains each user's documents, each with a distinct id, title, and content. Additional metadata can be included in the documents table to improve search functionality. The user_id is linked to the documents instead of chunk embeddings to replicate real-world scenarios more accurately.

```sql
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
title TEXT,
content TEXT,
-- Additional metadata
);
```
### Chunks
Embeddings store vectorized representations of chunks of documents, and use document_id to reference its document.

```sql
CREATE TABLE chunks (
id SERIAL PRIMARY KEY,
document_id INTEGER REFERENCES documents(id),
embedding vector(512) NOT NULL
);
```
### Entity-Relation Diagram
<img src="./multi-tenancy/er.svg" style="height: 600px"/>

## Implementing Multi-Tenancy

### Create Index

Create vector index on `chunks.embedding` to accelerate query process. Here we use dot product as the similarity measure.
```sql
CREATE INDEX idx_chunks_embedding ON chunks USING vectors (embedding vector_dot_ops);
```

### Run the query with user_id

```sql
SELECT chunks.id AS chunk_id, documents.id AS document_id, documents.title
FROM chunks
INNER JOIN documents ON chunks.document_id = documents.id
WHERE documents.user_id = 'alice' /* Replace with user_id */
ORDER BY chunks.embedding <-> '[3,2,1]' /* Replace with query embedding */ LIMIT 5;
```
In this query:

- The `SELECT` statement now includes chunks.id AS chunk_id, documents.id AS document_id, and documents.title to return the chunk ID, the document ID, and the document title, respectively.
- The `INNER JOIN` ensures that only chunks associated with documents owned by the specified user ('alice') are considered.
- The `ORDER BY` clause calculates the distance between the chunk's embedding and the provided vector [3,2,1], sorting the results by similarity. The closest or most similar embeddings are returned first.
- Replace `[3,2,1]` with the query embedding and 'specific_user_id' with the actual ID of the user for whom you are performing the search.

## Performance Tip
The vector search will scan through additional points and evaluate the conditions individually. Consequently, if there is a large number of user IDs or the filter criteria are challenging to meet, the query speed may be impacted. To enhance performance, you may want to explore utilizing PostgreSQL's partition table function.
Loading

0 comments on commit 8c4a00c

Please sign in to comment.