Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nmslib indexer #2417

Merged
merged 22 commits into from
Jul 7, 2019
Merged

Add nmslib indexer #2417

merged 22 commits into from
Jul 7, 2019

Conversation

masa3141
Copy link
Contributor

Hi, I added nmslib indexer.

Some research shows nmslib is better than annoy indexer.
https://erikbern.com/2018/06/17/new-approximate-nearest-neighbor-benchmarks.html
https://www.benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/

This is the first time to contribute to gensim. If I miss something, please let me know.

@piskvorky
Copy link
Owner

piskvorky commented Mar 15, 2019

Thanks @masa3141 !

Ping @searchivarius – would you have capacity to review this PR? (choice of NMSLIB APIs, default parameters, etc). Cheers.

@searchivarius
Copy link

@piskvorky yes, will do soon. thanks letting me know!

@masa3141
Copy link
Contributor Author

Ping @searchivarius
Hi could you review the code? Thanks!

@searchivarius
Copy link

Please, bear with me. There was recently reported what seems to be a serious bug. I need to fix it first.

@masa3141
Copy link
Contributor Author

I see. Thank you!

@masa3141
Copy link
Contributor Author

@searchivarius
Sorry to bother you again.
Could you review the code?
Thanks!

@searchivarius
Copy link

@masa3141 sorry for the delay, I still haven't resolved the issue with the recently reported NMSLIB bug. In fact, knnQuery might not work, but I don't know yet why, knnQueryBatch seems to be working fine.
BTW, do I get it correctly that you use only the cosine distance?

@masa3141
Copy link
Contributor Author

Thank you for your support in your busy time. Please review at your convenience.
Yes, I only use cosine distance because annoy indexer in gensim only supports cosine distance.

@mpenkov mpenkov added the feature Issue described a new feature label Apr 28, 2019
@searchivarius
Copy link

Ok, I have started working on this. Should be ready soon!

@searchivarius
Copy link

Hi @masa3141 there seems to be, indeed, an issue with knnQuery. Could you slightly change your code so it uses knnQueryBatch instead? I am going to fix knnQuery very soon, but it will take time and the older NMSLIB versions will still be broken.

Please, make query matrix 2-dim numpy, example is here:
https://github.com/nmslib/nmslib/blob/master/python_bindings/notebooks/search_vector_dense_optim.ipynb

Thank you!

@masa3141
Copy link
Contributor Author

Hi @searchivarius, thank you for the review!
I changed the code to use knnQueryBatch instead of knnQuery.
Thanks!

@masa3141
Copy link
Contributor Author

masa3141 commented May 17, 2019

Did the test fail due to Memory Error?
Oh, it seems to fail to install nmslib.

@searchivarius
Copy link

@masa3141 no it's related to the using wrong memory layout for numpy array data. I am quite surprised it wasn't caught before.

@masa3141
Copy link
Contributor Author

How do I fix it? Does it happen only in python2.7?

@searchivarius
Copy link

@masa3141 ohhh I missed that you changed to knnQueryBatch already. I will review shortly.

@searchivarius
Copy link

Hi @masa3141 and @piskvorky I have reviewed the querying part. It does look fine to me, many thanks! It would be nice to double check how all is working though when things are merged.

@masa3141
Copy link
Contributor Author

@searchivarius Thank you for the review!
@piskvorky It seems to fail to install nmslib in ci container in Python2.7. How do I fix this?

@piskvorky
Copy link
Owner

No idea. CC @mpenkov .

@masa3141
Copy link
Contributor Author

I changed to install nmslib into CI only when python version is over 3.0.
It seems to work.
(The test on python2.7 windows is failing. This failure happens in other pull requests.)

@masa3141
Copy link
Contributor Author

@piskvorky @mpenkov
Hi, do I need to do anything?
Thanks.

@mpenkov
Copy link
Collaborator

mpenkov commented May 30, 2019

@masa3141 Not in the immediate future. Let us work out why Appveyor is failing. I'll get back to you once that's done.

@mpenkov mpenkov self-assigned this May 30, 2019
@masa3141
Copy link
Contributor Author

masa3141 commented Jun 6, 2019

@mpenkov Thanks for fixing Appveyor issue! When will this feature be merged?

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience. I finally got around to reviewing this. Please have a look at the comments.

gensim/similarities/nmslib.py Outdated Show resolved Hide resolved
gensim/similarities/nmslib.py Outdated Show resolved Hide resolved
gensim/similarities/nmslib.py Outdated Show resolved Hide resolved
@masa3141
Copy link
Contributor Author

@mpenkov
py27-win test fails. Do you know the reason and solution?
Thanks.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 29, 2019

Hmm, looks like a dependency problem:

ERROR:   py27-win: could not install deps [numpy==1.11.3, scipy==0.18.1, .[test-win]]; v = InvocationError(u"'C:\\projects\\gensim-431bq\\.tox\\py27-win\\Scripts\\pip.EXE' install --timeout=60 --trusted-host 28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com --find-links http://28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com/ numpy==1.11.3 scipy==0.18.1 '.[test-win]'", 2)

@masa3141
Copy link
Contributor Author

Interesting. It didn't happen before. From today this happens.
I have no idea to fix this. Could you look at this problem? I think this will happen in other pull requests.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 29, 2019

OK, leave it with me. Once I fix this, I will ping you.

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picked up one more thing. While I'm messing with appveyor, please use underscores instead of camel cases in test method names.

I fixed a couple myself, leaving the rest up to you.

gensim/test/test_similarities.py Outdated Show resolved Hide resolved
gensim/test/test_similarities.py Outdated Show resolved Hide resolved
gensim/test/test_similarities.py Outdated Show resolved Hide resolved
gensim/test/test_similarities.py Outdated Show resolved Hide resolved
@masa3141
Copy link
Contributor Author

I found there are many camel cases in test method names. (Ex, test_nmf.py, test_sklearn_api.py)
Should we change to use underscores in test_similarities.py?

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 30, 2019

Yes, we should, but those files are out of scope for this PR.

Let's handle the nmslib test cases only for now, and leave the rest for another PR.

mpenkov and others added 8 commits June 30, 2019 15:12
Upgrade pip
I'll need to think of another way to do this.

This reverts commit 32e6e51.
This reverts commit e89b8e2.
This reverts commit 85b8d28.
This reverts commit a386f67.
This reverts commit fbec445.
@masa3141
Copy link
Contributor Author

I see. I changed to use underscores.

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 30, 2019

Opened a new ticket with support: https://help.appveyor.com/discussions/problems/24208-unable-to-use-the-latest-version-of-pip-for-my-tox-builds

@masa3141
Copy link
Contributor Author

masa3141 commented Jul 6, 2019

@mpenkov
Is there anything I should do?

@mpenkov mpenkov merged commit c9ec242 into piskvorky:develop Jul 7, 2019
@mpenkov
Copy link
Collaborator

mpenkov commented Jul 7, 2019

@masa3141 Finally merged. Thank you for your contribution and your patience!

@masa3141
Copy link
Contributor Author

masa3141 commented Jul 7, 2019

Thanks a lot!

@searchivarius
Copy link

Hurray, thanks everybody!

@masa3141
Copy link
Contributor Author

masa3141 commented Jul 8, 2019

@mpenkov Hi, 3.8.0 release doesn't contain this nmslib feature? The release note doesn't mention this feature. Thanks

@mpenkov
Copy link
Collaborator

mpenkov commented Jul 8, 2019

@masa3141 Thank you for pointing it out. Looks like our changelog generation script missed the nmslib PR. The indexer itself is definitely there: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/similarities/nmslib.py

I will update the change log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issue described a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants