-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAISS returns negative ids (not -1) #2135
Comments
This would be a bug. What type of index? Code to repro? |
We use 'PCAR64,IVF4096(IVF512,PQ32x4fs,RFlat),SQ8' as index type to index BERT embeddings. We applied L2 norm not only before training but also before searching. This is not the exact code flow. I showed it only to illustrate the key parts and their standings.
the I in the previous line has the values, [-7882858908496526721, 148477514, 7318772159358522531, 131445014, -8263696823219615651, 123521031, -5807271324421810311, 124208452, 38032875, 146904364, 139624482, 125867015, 139643914, 125254479, 18606842, 147101967, -8246501689735019874, 119442532, 141874179, 138070620, 130286272, 129548931, 131521583, 107358047, -8528840699497380558, 148568457, 127924406, 60198081, 23002488, 134854969, 38924547, 134703770, 33097768, 146073936, 69678871, 145498691, -6661535923009526919, 145471504, 137858014, 142931410, 137858015, 140687014, 140038207, 74294394] The other parts of the code are quite long and object-oriented. It is almost impossible to understand from there. Hence I pushed the essential parts of the code. If you still need the whole code I can attach the notebook files or scripts. we used merge_on_disk.py example to index all of our 127 million data on the disk. For merging operation, after creating several block.index we use By the way for updating the current index we use
|
I think the cause of the problem is adding. When I add indices on the existing index it just brokes the pipe. I don't know why. main.index and pseudo.index that I show the previous post was working but after adding new data they just get broke. They all use the same merge file maybe this is cause I don't know. |
I tried to understand the source of the error and I found that ids are corrupted after using
Our index is on disk. Is it the problem of us? Do we have to add our new entries via adding new blocks and after merging them again? Do you have the support of add_with_ids for the on-disk index? @mdouze |
Hi @abdullahbas , |
@fonspa Actually we solved it by using two alternative indexes. We create and delete asynchronously the old index. After completing everything on the new index we read it as the main index then delete the old one. You have to manage the blocking process in the most optimized way. We just update corresponding blocks according to new messages. It takes only 2-3 mins. |
Sorry for the late answer. |
BTW. I have encountered the problem that FAISS return some -1 indices. But I cannot find any explanation on the meaning of -1. I have checked that index.ntotal = 1027120, method=IVFFlat with nlist=1024 and nprobs=4. I searched for xq.shape=(1024, 488) for k=50 neighbors. It returns as follows:
I have also checked the X that I used to build the index: All the elements are finite and <10. Is there anyone know the reason? |
Hi can you explain better what you did to make it work? We have same issue now and any operator over ondisk indices will corrupt the index. I have tried:
All make index corrupt and search to return wrong indices. |
@mdouze Sorry, I have the same issue of invalid indices after trying to add data in any way into the ondisk index.
All product invalid indices. My questions are:
Thank you in advance |
Summary
<FAISS returns really big negative ids (not -1) like -663323444433213679 and big ids like 77711200039921993321 even we do not have those indices. ->
Faiss version: 1.7.1
Installed from: pypi
Running on:
Interface:
Reproduction instructions
It is very complicated code so I couldn't share a code snippet but it returns something like
[-7882858908496526721, 148477514, 7318772159358522531, 131445014, -8263696823219615651, 123521031, -5807271324421810311, 124208452, 38032875, 146904364, 139624482, 125867015, 139643914, 125254479, 18606842, 147101967, -8246501689735019874, 119442532, 141874179, 138070620, 130286272, 129548931, 131521583, 107358047, -8528840699497380558, 148568457, 127924406, 60198081, 23002488, 134854969, 38924547, 134703770, 33097768, 146073936, 69678871, 145498691, -6661535923009526919, 145471504, 137858014, 142931410, 137858015, 140687014, 140038207, 74294394]
The text was updated successfully, but these errors were encountered: