VectorDB support (pgvector) for archival memory #226

sarahwooders · 2023-10-31T21:13:00Z

Addresses #142

This pull request will allow for archival memory to be stored in a vector DB (pgvector) in addition to local. This will help with retrieving from larger datasets that can't always be stored locally.
Agent archival memory can be loaded into multiple times, via connecting agents to multiple data sources
Load text/embeddings from an existing vector DB

To-Dos:

Load data via memgpt load into pgvector
Update memgpt list sources to look at DB tables
Enable multiple data sources per agent
Enable configuration of archival storage location (DB endpoint, local)

CLI Updates

Configuration

When running memgpt configure, user will have the option to choose between local/postgres/chroma, and provide their chroma/postgres URI which will be used to store data. This will configure the data backend that MemGPT uses to save and read archival storage.

NOTE: Users can swap out the data backend, but a previously loaded data source or saved agent will no longer be accessible until they switch back to the original backend.
NOTE: Do not add a storage backend to MemGPT unless you are okay with MemGPT writing new tables/data.

Importing data

Imported data and agent archival data will be stored in the chosen data backend, not just locally.

So if Postgres is the data backend, the following command will load data into Postgres:

memgpt load directory --name <data-source-name> --input-dir <dir> --recursive

An agent can access the data in multiple ways. Both methods will copy the attached data into the agent's index.

Attach the data source to an agent (agent must already exist from prior memgpt run).

memgpt attach --agent <agent-name> --data-source <data-source-name>

Load the data from the CLI

memgpt run --agent <agent-name> 
... 
> Enter your message: /attach
? Select data source short-stories

Once data is attached to an agent, it cannot be removed. We can address this in a future issue.

Loading from a VectorDB

Users should not provide their vector DB as a storage backend unless they are ok with MemGPT writing to the DB. If you want to make read-only data accessible to MemGPT, you must load the data via:

memgpt load vector-database --name <name> --uri <url> --embedding-col <embedding-col> --text-col <text-col>

Future Issues

Deleting data sources
Removing data sources from an agent
Adding chroma

silence48

NICE WORK!!

Can we make it possible to pass a connection string instead and would be nice to set overrides for the table and column names i have preloaded vectors.

Also i saw a comment that says it will mess up existing data but didnt look why... thats not ideal.

I will try to test tomorrow and add some experience feedback.

sarahwooders · 2023-11-01T04:09:06Z

@silence48 what kind of database do you have? I'm using the LlamaIndex wrappers for retrieving from Postgres/Chroma for now, but it seems like they dont allow for configuring the column names (run-llama/llama_index#6058). But we can add a non-LlamaIndex connector that is more configurable.

Another issue with bringing your own table is that I'm not sure how we should store new archival memories generated by the agent. Presumably you don't want MemGPT to insert new documents into your existing table? There's two possible solutions I can think of for this:

Copy the user's table into another table which MemGPT can write to
Have MemGPT search across both a writeable archival memory table and also the user's table

silence48 · 2023-11-01T16:18:21Z

Copy the user's table into another table which MemGPT can write to

Have MemGPT search across both a writeable archival memory table and also the user's table

This would work for me i think.

Its in a postgres database but i have 2 tables with vectors one called codevecs and one called docvecs. I had set it up following the readme in the pgvector repo.

sarahwooders added 21 commits October 30, 2023 16:58

mark depricated API section

89cf976

add readme

be6212c

add readme

b011380

add readme

59f7b71

add readme

176538b

add readme

9905266

add readme

3606959

add readme

c48803c

add readme

40cdb23

add readme

ff43c98

CLI bug fixes for azure

01db319

check azure before running

a11cef9

Merge branch 'cpacker:main' into main

a47d49e

Update README.md

fbe2482

Update README.md

446a1a1

bug fix with persona loading

1541482

Merge branch 'main' of github.com:sarahwooders/MemGPT

5776e30

Merge branch 'cpacker:main' into main

d48cf23

remove print

7a8eb80

Merge branch 'main' of github.com:sarahwooders/MemGPT

9a5ece0

add initial postgres implementation

c627089

sarahwooders marked this pull request as draft October 31, 2023 21:13

sarahwooders requested a review from cpacker October 31, 2023 21:13

sarahwooders added 3 commits October 31, 2023 18:27

working chroma loading

3485b9e

add postgres tests

54b37be

working initial load into postgres and chroma

ce6f17e

silence48 reviewed Nov 1, 2023

View reviewed changes

add load index command

d59ed20

sarahwooders and others added 27 commits November 3, 2023 14:36

Merge branch 'main' into vectordb

1a2d593

black

21a0ce1

patch No such file or directory: '/home/runner/.memgpt/config'

be3cb13

remove load from new CLI

b6bfc29

Merge branch 'vectordb' of github.com:sarahwooders/MemGPT into vectordb

970378f

Add basic configure test

8869734

enable openai on actions

b2c68e8

debugging secrets

5027f71

Add basic save load test

a6895a4

another diff

2d0d9f2

Make test configure work

d0e4332

sanity check

cc2b5ec

Create sarah-test.yml

b890d1e

Update sarah-test.yml

a5d3fc1

sanity 2

2a47aff

Update sarah-test.yml

00f5633

add extra poetry config

ede85b3

update poetry

7a1ee5c

Merge branch 'vectordb' of github.com:sarahwooders/MemGPT into vectordb

0eef405

move env up

d6ea79e

all the places

581849a

more

b9f943e

put env var in front of poetry run

59c79f7

add back venv

6d624c7

mimic other yml more

ee29a3a

another mimic

d10475e

change versions

6f162a6

sarahwooders merged commit 2492db6 into letta-ai:main Nov 3, 2023
1 of 2 checks passed

cevatkerim mentioned this pull request Nov 5, 2023

/attach break the chat and agent becomes unusable. #322

Closed

mattzh72 pushed a commit that referenced this pull request Oct 9, 2024

VectorDB support (pgvector) for archival memory (#226)

8669afc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorDB support (pgvector) for archival memory #226

VectorDB support (pgvector) for archival memory #226

sarahwooders commented Oct 31, 2023 •

edited

Loading

silence48 left a comment •

edited

Loading

sarahwooders commented Nov 1, 2023

silence48 commented Nov 1, 2023

VectorDB support (pgvector) for archival memory #226

VectorDB support (pgvector) for archival memory #226

Conversation

sarahwooders commented Oct 31, 2023 • edited Loading

CLI Updates

Configuration

Importing data

Loading from a VectorDB

Future Issues

silence48 left a comment • edited Loading

Choose a reason for hiding this comment

sarahwooders commented Nov 1, 2023

silence48 commented Nov 1, 2023

sarahwooders commented Oct 31, 2023 •

edited

Loading

silence48 left a comment •

edited

Loading