Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pgvector extension #472

Merged
merged 5 commits into from
Jan 21, 2023
Merged

Add pgvector extension #472

merged 5 commits into from
Jan 21, 2023

Conversation

gregnr
Copy link
Member

@gregnr gregnr commented Jan 10, 2023

What kind of change does this PR introduce?

Feature

What is the current behavior?

Popular ML extension pgvector is unavailable in Supabase. The closest related extension today is cube, but its dimension limit (100) makes it impractical for modern vector math tasks.

What is the new behavior?

Adds pgvector extension, one of the highest voted extensions on the Vote for Postgres extensions discussion.

Additional context

As the language model ecosystem is increasing in popularity (eg. GPT-3), there is becoming a need for efficient vector operations (like vector similarity search) on large vector-based datasets (eg. GPT-3 Embeddings). The ability to use Postgres for this type of work vs. outsourcing to another system would make Supabase a compelling service offering for the AI/ML community.

This PR mirrors best practices and conventions used by other extensions to install pgvector through Ansible.

Please let me know if there is any additional work I can do to help bring this extension to Supabase.

@gregnr gregnr requested a review from a team as a code owner January 10, 2023 07:10
@gregnr gregnr force-pushed the feat/pgvector branch 2 times, most recently from 3e61c07 to 0a3e05d Compare January 12, 2023 01:10
@olirice olirice self-requested a review January 16, 2023 14:45
@olirice
Copy link
Contributor

olirice commented Jan 16, 2023

Thanks for the PR!

pgvector looks great. I'm comparing it with the (available) cube extension and there is a lot of overlap. Is the only issue with using cube the 100 element limit?

I see in the cube docs

To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.

so we could potentially bump that cap over the standard embedding sizes to something like 4096

the other features I see in pgvector that aren't in cube are some niche index operations and cosine distance support. Since euclidean distance (supported by both) is equal to cosine distance when operating on unit vectors that doesn't seem like too bad of a constraint to work around.

If there are other differences that are significant for common uses cases could you please add a few notes wrt what they are?

@michelp
Copy link
Contributor

michelp commented Jan 16, 2023

+1 for pgvector. cube has not hada major update since 2006, if we patch it then we own it. pgvector is targeted directly at very common vector embedding use cases and is actively maintained.

@olirice
Copy link
Contributor

olirice commented Jan 16, 2023

@supabase/backend please +1 and merge when you get a sec

@mattaylor
Copy link

bump

@gregnr
Copy link
Member Author

gregnr commented Jan 21, 2023

@olirice @Lakshmipathi Thanks for taking the time to review & thanks for the approvals. There were a few new merge conflicts as of yesterday, but I have resolved these now.

@olirice
Copy link
Contributor

olirice commented Jan 21, 2023

thanks @gregnr, looks great!
sorry for the slow cycle time on this one

@harshit0209
Copy link

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants