Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save the ColBERT encodings to disk. #237

Open
Diegi97 opened this issue Aug 9, 2024 · 2 comments
Open

Save the ColBERT encodings to disk. #237

Diegi97 opened this issue Aug 9, 2024 · 2 comments

Comments

@Diegi97
Copy link
Contributor

Diegi97 commented Aug 9, 2024

I have a use case where I run ColBERT on CPU on a couple thousand documents. For this I don't use PLAID but the encode and search_encoded_docs methods and the search works fast enough, the problem is that encoding all these documents on CPU takes time and I don't want to encode everything everytime I deploy the model so I developed a way for saving and loading these encodings:

https://github.com/ChatFAQ/ChatFAQ/blob/cc19e4b85198062888d6320e59276db31461f4e9/chat_rag/chat_rag/retrievers/colbert_retriever.py#L163

If interested I could improve and integrate this into the RAGPretrainedModel or ColBERT classes and make a PR.

@faezs
Copy link

faezs commented Aug 9, 2024

I'd like this, same workflow as you and similar solution but having it be built in would be great. maybe having it be compatible with overwrite_index for cache invalidation would also be a good idea?

@bclavie
Copy link
Collaborator

bclavie commented Aug 12, 2024

This is coming as part of the overhaul I semi-announced on twitter (just on twitter, to stay lowkey...)

I have no exact ETA but these features will be available on the overhaul branch (which isn't installable right now as it'll crash, but will be very soon) within the next couple weeks.

If you have just ~2k documents and want to improve latency, the best way forward will most likely to use the HNSW index that'll ship as the native indexing mechanism for any collections under ~5k documents. It gets performance more or less matching exact search while being quite a bit quicker. Otherwise, something pretty similar to your mechanism will be added for loading/saving in-memory encodings.

Thanks for your interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants