Best way of indexing a large text file #7
Replies: 2 comments 3 replies
-
Thanks @ReneReiterer… Vectra is in its early stages and will gain more features over time so there are a few pieces you’ll need to pull in from other libraries. You basically need a text chunker that breaks your large text file into chunks. I’d recommend the chunker in LangChain.JS. I can’t remember the exact class name but I think it’s called the RecursiveCharacterSplitter class or something similar. You need to give the class a chunk size (I’d recommend 1600 characters) and an overlap size (I’d recommend 200 characters.) You can just pass the whole file into this chunker and then generate embeddings for each chunk and then store each chunk in a LocalIndex. You can store the text from the chunk as metadata for the chunk and then when you query the index with the embeddings for a users query you can take the top 5 chunks and add the text of those chunks to your prompt. |
Beta Was this translation helpful? Give feedback.
-
The new |
Beta Was this translation helpful? Give feedback.
-
Hello,
first of, really love your package, so easy to work with.
But since i am pretty new to everything about vector databases, i cant really seem to get my head around on how to best index a large text file so it can be efficiently searched by your package. Any Tips or Ideas would be really helpfull
Beta Was this translation helpful? Give feedback.
All reactions