Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorDB support (pgvector) for archival memory #226

Merged
merged 102 commits into from
Nov 3, 2023
Merged

VectorDB support (pgvector) for archival memory #226

merged 102 commits into from
Nov 3, 2023

Conversation

sarahwooders
Copy link
Collaborator

@sarahwooders sarahwooders commented Oct 31, 2023

Addresses #142

  • This pull request will allow for archival memory to be stored in a vector DB (pgvector) in addition to local. This will help with retrieving from larger datasets that can't always be stored locally.
  • Agent archival memory can be loaded into multiple times, via connecting agents to multiple data sources
  • Load text/embeddings from an existing vector DB
image

To-Dos:

  • Load data via memgpt load into pgvector
  • Update memgpt list sources to look at DB tables
  • Enable multiple data sources per agent
  • Enable configuration of archival storage location (DB endpoint, local)

CLI Updates

Configuration

When running memgpt configure, user will have the option to choose between local/postgres/chroma, and provide their chroma/postgres URI which will be used to store data. This will configure the data backend that MemGPT uses to save and read archival storage.

NOTE: Users can swap out the data backend, but a previously loaded data source or saved agent will no longer be accessible until they switch back to the original backend.
NOTE: Do not add a storage backend to MemGPT unless you are okay with MemGPT writing new tables/data.

Importing data

Imported data and agent archival data will be stored in the chosen data backend, not just locally.

So if Postgres is the data backend, the following command will load data into Postgres:

memgpt load directory --name <data-source-name> --input-dir <dir> --recursive 

An agent can access the data in multiple ways. Both methods will copy the attached data into the agent's index.

  1. Attach the data source to an agent (agent must already exist from prior memgpt run).
memgpt attach --agent <agent-name> --data-source <data-source-name> 
  1. Load the data from the CLI
memgpt run --agent <agent-name> 
... 
> Enter your message: /attach
? Select data source short-stories

Once data is attached to an agent, it cannot be removed. We can address this in a future issue.

Loading from a VectorDB

Users should not provide their vector DB as a storage backend unless they are ok with MemGPT writing to the DB. If you want to make read-only data accessible to MemGPT, you must load the data via:

memgpt load vector-database --name <name> --uri <url> --embedding-col <embedding-col> --text-col <text-col> 

Future Issues

  • Deleting data sources
  • Removing data sources from an agent
  • Adding chroma

@sarahwooders sarahwooders marked this pull request as draft October 31, 2023 21:13
Copy link

@silence48 silence48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE WORK!!

Can we make it possible to pass a connection string instead and would be nice to set overrides for the table and column names i have preloaded vectors.

Also i saw a comment that says it will mess up existing data but didnt look why... thats not ideal.

I will try to test tomorrow and add some experience feedback.

@sarahwooders
Copy link
Collaborator Author

@silence48 what kind of database do you have? I'm using the LlamaIndex wrappers for retrieving from Postgres/Chroma for now, but it seems like they dont allow for configuring the column names (run-llama/llama_index#6058). But we can add a non-LlamaIndex connector that is more configurable.

Another issue with bringing your own table is that I'm not sure how we should store new archival memories generated by the agent. Presumably you don't want MemGPT to insert new documents into your existing table? There's two possible solutions I can think of for this:

  1. Copy the user's table into another table which MemGPT can write to
  2. Have MemGPT search across both a writeable archival memory table and also the user's table

@silence48
Copy link

  1. Copy the user's table into another table which MemGPT can write to
  2. Have MemGPT search across both a writeable archival memory table and also the user's table

This would work for me i think.

Its in a postgres database but i have 2 tables with vectors one called codevecs and one called docvecs. I had set it up following the readme in the pgvector repo.

@sarahwooders sarahwooders merged commit 2492db6 into letta-ai:main Nov 3, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants