fix: ID Mismatch Error in VectorDB During Evaluation #1033 #1056

e7217 · 2024-12-16T03:30:53Z

description
Hello

I am suggesting some code changes to address issue #1033. The error occurs when an item in the vectordb is searched, but its ID does not match the ID of the raw_doc corpus. I think the retriever aims to retrieve the item with the highest score. To address this, I have added a key for the content. While this change may require additional storage capacity for the vectordb, it's similar to how Langchain uses a page_content key.

I have modified some code, but I have only referred to the documentation and have not run the code in practice, so there may be errors.

I appreciate your review. Thank you.

references

…217/AutoRAG into fix/id-mismatch-with-vectordb

e7217 · 2024-12-16T13:56:35Z

This PR may not fully align with your intentions in autorag. I tried to consider as many cases as possible, but there may be aspects you have been concerned about that I am unaware of. I understand that it might not be approved, but I would appreciate any feedback you can provide. Thank you.

vkehfdl1 · 2024-12-17T12:25:14Z

@e7217 Thank you for the PR! And apologize for the late review.
I will look through it, and will make some changes if needed. Thank you

vkehfdl1 · 2024-12-17T12:54:45Z

@e7217 Actually we discussed about the structure that do not use corpus_df at all for the AutoRAG structure.
@bwook00 I need your opinion about it. I think we have to discuss about it.

Pros

There will be no such things that "no doc_id in vectorddb"
Managing the corpus will be more easier.
Someone only have vectorDB can start AutoRAG easily.

Cons

It will be difficult to synchronize different vector DBs. => Just use one DB? What about embedding model?
The inconsistent result of the one project when vector DB is changed. => Give up the precise experiments?
Managing the right retrieval_gt might be harder.

Um Changyong and others added 11 commits December 13, 2024 16:48

feat: add field to vector item

e0515e1

feat: add field to vector item

7dc7528

feat: update chroma

329e0cf

feat: update couchbase

28efa65

fix: update return object of couchbase.query()

2e9e343

feat: update pinecone

d993a25

feat: update qdrant

6cacb4e

feat: update vectordb(weaviate)

1dc2366

Merge branch 'fix/id-mismatch-with-vectordb' of https://github.com/e7…

3188e44

…217/AutoRAG into fix/id-mismatch-with-vectordb

fix: reformatting...

8fb2be1

Merge branch 'main' into fix/id-mismatch-with-vectordb

b0d82ee

hongsw requested review from hongsw and bwook00 and removed request for hongsw December 16, 2024 03:47

fix: update some code and test code.

081110e

feat: add and change in test_vectordbs

3e2c634

vkehfdl1 added 2 commits December 17, 2024 21:25

Merge branch 'main' into fix/id-mismatch-with-vectordb

f07331e

Merge branch 'main' into fix/id-mismatch-with-vectordb

917b285

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ID Mismatch Error in VectorDB During Evaluation #1033 #1056

fix: ID Mismatch Error in VectorDB During Evaluation #1033 #1056

e7217 commented Dec 16, 2024

e7217 commented Dec 16, 2024 •

edited

Loading

vkehfdl1 commented Dec 17, 2024

vkehfdl1 commented Dec 17, 2024

fix: ID Mismatch Error in VectorDB During Evaluation #1033 #1056

Are you sure you want to change the base?

fix: ID Mismatch Error in VectorDB During Evaluation #1033 #1056

Conversation

e7217 commented Dec 16, 2024

e7217 commented Dec 16, 2024 • edited Loading

vkehfdl1 commented Dec 17, 2024

vkehfdl1 commented Dec 17, 2024

Pros

Cons

e7217 commented Dec 16, 2024 •

edited

Loading