-
Notifications
You must be signed in to change notification settings - Fork 35
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Limiting index size #3
Comments
Thanks @tlaanemaa ! You should be able to write a function like this to limit the number of items in the index: async function pruneIndex(index, max_items) {
// check index against retention policy
const stats = await index.getIndexStats();
if (stats.items < max_items) {
return;
}
// Remove oldest items first
await index.beginUpdate();
try {
const items = await index.listItems();
while (items.length > max_items) {
const item = items.unshif();
await index.deleteItem(item.id);
}
await index.endUpdate();
} catch (err) {
await index.cancelUpdate();
throw err;
}
} So a couple of thoughts for index size... Yes it's a linear search but it's still going to be faster then calling an external DB. With regards to index size that really just depends on how much memory you want to throw at it. if you're looking to store a ton of items in an index, I'd be very selective about what metadata I store in the index itself. You could either use Vectra's ability to store all metadata externally (for 100,000+ items I'd strongly consider that) or just don't store anything super big metadata wise. If indexing documents, for example, consider storing position offsets for where a chunk starts & stops instead of the chunk text itself. You can then read the chunk in from the file using those offsets only when they're needed. Another technique would be to organize your index into namespaces. Vectra doesn't have a direct namespace concept but you can easily mimic namespaces by creating a separate folder and With Vectra you actually have all the raw pieces needed to build a fairly large and capable DB. Some assembly required though... You'll probably get better perf optimizations on the metadata filtering side of things using a real DB but for the core vector search task it should work and scale as well as anything else... I'll also add that the early versions of Microsoft Exchange were just a bunch of files on disk so don't underestimate what you can build using just a bunch of files and folders. |
Nice, thanks. Btw how do you keep metadata separately with Vectra? I didn't manage to find anything in the readme. Thanks! |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hello,
First of all, I want to congratulate you on the fantastic work you've done with this package! 🎉
I'm considering using the vectra package as a backend for a project running on Node, and was hoping to get some guidance on managing the index size. My goal is to limit the index size to a predefined number, such as 100,000. I'd like to implement this kind of functionality whenever a new item is added:
By doing this, I would retain only the last 100,000 values added to the index (ordered by when they were added). Is there a way to achieve this with vectra?
Additionally, I was wondering if you could share some insights on the reasonable limits for the index size, given that it employs a linear search. Do you think an index size of 100,000 would be feasible? How about 1 million?
I appreciate your assistance and look forward to your response!
Best regards,
The text was updated successfully, but these errors were encountered: