Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HELP] Can Ragatouille take chunks with their embeddings as input for indexing? #264

Open
BlueKiji77 opened this issue Nov 23, 2024 · 0 comments

Comments

@BlueKiji77
Copy link

  • I want to index a long document with muti-vector embeddings.
  • Say I have span annotation for how I want my document to be chunked and want to apply late chunking to it, that is i want to have all the embeddings of my tokens before I perform the chunking.
  • This way I have contextualized chunks with multi-vector representation.
    Essentially what I want to do is Late Chunking without the chunk-wise vector pooling at the end

How do I do it with ragatouille?
I know how to get multi-vector embeddings for my document but can't figure out how to index and leverage the optimizations provided by Ragatouille.

@BlueKiji77 BlueKiji77 changed the title Can colbert v2 take chunks with their embeddings as input for indexing? [HELP] Can colbert v2 take chunks with their embeddings as input for indexing? Nov 23, 2024
@BlueKiji77 BlueKiji77 changed the title [HELP] Can colbert v2 take chunks with their embeddings as input for indexing? [HELP] Can Ragatouille take chunks with their embeddings as input for indexing? Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant