Replies: 3 comments 1 reply
-
Thanks @tlaanemaa ! You should be able to write a function like this to limit the number of items in the index: async function pruneIndex(index, max_items) {
// check index against retention policy
const stats = await index.getIndexStats();
if (stats.items < max_items) {
return;
}
// Remove oldest items first
await index.beginUpdate();
try {
const items = await index.listItems();
while (items.length > max_items) {
const item = items.unshif();
await index.deleteItem(item.id);
}
await index.endUpdate();
} catch (err) {
await index.cancelUpdate();
throw err;
}
} So a couple of thoughts for index size... Yes it's a linear search but it's still going to be faster then calling an external DB. With regards to index size that really just depends on how much memory you want to throw at it. if you're looking to store a ton of items in an index, I'd be very selective about what metadata I store in the index itself. You could either use Vectra's ability to store all metadata externally (for 100,000+ items I'd strongly consider that) or just don't store anything super big metadata wise. If indexing documents, for example, consider storing position offsets for where a chunk starts & stops instead of the chunk text itself. You can then read the chunk in from the file using those offsets only when they're needed. Another technique would be to organize your index into namespaces. Vectra doesn't have a direct namespace concept but you can easily mimic namespaces by creating a separate folder and With Vectra you actually have all the raw pieces needed to build a fairly large and capable DB. Some assembly required though... You'll probably get better perf optimizations on the metadata filtering side of things using a real DB but for the core vector search task it should work and scale as well as anything else... I'll also add that the early versions of Microsoft Exchange were just a bunch of files on disk so don't underestimate what you can build using just a bunch of files and folders. |
Beta Was this translation helpful? Give feedback.
-
Nice, thanks. Btw how do you keep metadata separately with Vectra? I didn't manage to find anything in the readme. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Yeah it's done the Pinecone way which isn't overly obvious. When you create the index you can specify the list of fields you want indexed and any other metadata fields will be stored externally. You currently need to specify at least one indexed field. So using your "infinite memory" use case: index.createIndex({
version: 1,
metadata_config: {
indexed: ['user_id']
}
}); Now can insert new memories and only the index.insertItem({
vector: [...],
metadata: {
user_id: `<user ID>`,
date: Date.now(),
text: `<text of memory>`
}
}); The |
Beta Was this translation helpful? Give feedback.
-
Hello,
First of all, I want to congratulate you on the fantastic work you've done with this package! 🎉
I'm considering using the vectra package as a backend for a project running on Node, and was hoping to get some guidance on managing the index size. My goal is to limit the index size to a predefined number, such as 100,000. I'd like to implement this kind of functionality whenever a new item is added:
By doing this, I would retain only the last 100,000 values added to the index (ordered by when they were added). Is there a way to achieve this with vectra?
Additionally, I was wondering if you could share some insights on the reasonable limits for the index size, given that it employs a linear search. Do you think an index size of 100,000 would be feasible? How about 1 million?
I appreciate your assistance and look forward to your response!
Best regards,
Beta Was this translation helpful? Give feedback.
All reactions