Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NMSLIB dependency #473

Closed
nanthony007 opened this issue Apr 12, 2023 · 15 comments
Closed

Remove NMSLIB dependency #473

nanthony007 opened this issue Apr 12, 2023 · 15 comments

Comments

@nanthony007
Copy link

I'm not sure if this would be possible and what alternatives may even exist, BUT, due to years of inactivity and unresponsiveness on the primary nmslib maintainer's side (not faulting him), the nmslib dependency makes scispacy very unaccessible to new users and, in fact, will remain completely inaccessible to users on new operating systems (Windows 11) or running modern versions of python (3.11).

Are there any possible alternatives for the few lines of code where this package uses nmslib?

From what I can see those are primarily two calls to nmslib.init() and otherwise type annotations.

Please advise, if possible I would love to help here but am not comfortable writing robust production C++ code nor am I an expert on the scispacy models themselves.

@dakinggg
Copy link
Collaborator

Hi @nanthony007, replacing nmslib with another approximate nearest neighbor search library is certainly doable, but is a bit more involved than you might realize. The candidate generator (

class CandidateGenerator:
) uses nmslib for the approximate nearest neighbor search, so we would need to swap that out for another library, which means recreating an index with a different library (
def create_tfidf_ann_index(
for doing it with nmslib), then rewriting the code to load and use that index, and then evaluating the candidate generation to make sure speed and accuracy are still on par with the previous implementation. This is unfortunately not something I am likely to have time to do in the near future, but I will try.

That being said, I have recently installed nmslib successfully on Windows Subsystem for Linux with python 3.10. 3.11 likely does not work, as you say.

@nanthony007
Copy link
Author

I ideally wanted to include scispacy as a dependency of a package for more novice programmers to have some simple access to biomedical NER and using WSL and/or navigating dependency (python, scispacy, etc) versions seems like mental overhead I want to avoid.

Is there a way this model could be re-trained using spacy's new entity linker itself? Could that accomplish the same NEL while benefiting from scispacy's models?

@nanthony007
Copy link
Author

I wonder if annoy could be a good fit for an alternative ANN index?

@nanthony007
Copy link
Author

Please see #481

@nanthony007
Copy link
Author

Closing due to no clear direction forward...

@phaeta
Copy link

phaeta commented Jun 7, 2023

@nanthony007 I was able to build scispacy for Python 3.11 by using the latest pybind11 (2.10.4) and building nmslib from the master branch, e.g.:

# in a clean virtual environment
pip install pybind11==2.10.4
pip install "nmslib @ git+https://github.com/nmslib/nmslib.git@ade4bcdc9dd3719990de2503871450b8a62df4a5/#subdirectory=python_bindings"
pip install scispacy
...

(ade4bcdc9dd3719990de2503871450b8a62df4a5 was the last commit to master; quite awhile ago).

@dakinggg
Copy link
Collaborator

dakinggg commented Jun 8, 2023

Thanks @phaeta ! Could you share what OS you are on?

@phaeta
Copy link

phaeta commented Jun 8, 2023

@dakinggg macOS Ventura

@nanthony007
Copy link
Author

nanthony007 commented Jun 12, 2023

Unfortunately I am unable to replicate this. Copying your git install command resulted in git not finding the revision. Upon removing the trailing "/" pip attempts to build the wheels and install but fails during the Clang build. @phaeta are you on M1 or Intel? Are you using conda python?

The build errors I am getting appear to be around SIMD and Scalars...

-std=c++14 -fvisibility=hidden
  ./similarity_search/src/distcomp_scalar.cc:85:9: error: pragma message requires parenthesized string
  #pragma message WARN("ScalarProductSIMD<float>: SSE2 is not available, defaulting to pure C++ implementation!")
          ^
  ./similarity_search/src/distcomp_scalar.cc:169:18: warning: explicit instantiation of 'NormScalarProductSIMD<float>' that occurs after an explicit specialization has no effect [-Winstantiation-after-specialization]
  template float   NormScalarProductSIMD<float>(const float* pVect1, const float* pVect2, size_t qty);
                   ^
  ./similarity_search/src/distcomp_scalar.cc:83:7: note: previous template specialization is here
  float NormScalarProductSIMD(const float* pVect1, const float* pVect2, size_t qty) {
        ^
  ./similarity_search/src/distcomp_scalar.cc:195:9: error: pragma message requires parenthesized string
  #pragma message WARN("ScalarProductSIMD<float>: SSE2 is not available, defaulting to pure C++ implementation!")
          ^
  ./similarity_search/src/distcomp_scalar.cc:246:18: warning: explicit instantiation of 'ScalarProductSIMD<float>' that occurs after an explicit specialization has no effect [-Winstantiation-after-specialization]
  template float   ScalarProductSIMD<float>(const float* pVect1, const float* pVect2, size_t qty);
                   ^
  ./similarity_search/src/distcomp_scalar.cc:193:7: note: previous template specialization is here
  float ScalarProductSIMD(const float* pVect1, const float* pVect2, size_t qty) {
        ^
  2 warnings and 2 errors generated.
  error: command '/usr/bin/clang' failed with exit code 1

@phaeta
Copy link

phaeta commented Jun 14, 2023

@nanthony007 Try this:
pip install "nmslib @ git+https://github.com/nmslib/nmslib.git/#subdirectory=python_bindings"

Regarding architecture, I'm using an Intel Mac. I'm using python@3.11 from Homebrew. Also the master-branch nmslib build works for me on Linux (Ubuntu 20.04 (aarch64), Python 3.11 built from source).

I'll play around with this in a container and put together a Dockerfile. Also, I have access to an M1 Mac Mini; I'll try things there too. Stay tuned

@nanthony007
Copy link
Author

Okay thanks! That command also does not work so maybe it's something with M1? My main concern is M1 and Windows 11 support since I think most students will likely be on those platforms.

@fsecada01
Copy link

@nanthony007 Try this: pip install "nmslib @ git+https://github.com/nmslib/nmslib.git/#subdirectory=python_bindings"

Regarding architecture, I'm using an Intel Mac. I'm using python@3.11 from Homebrew. Also the master-branch nmslib build works for me on Linux (Ubuntu 20.04 (aarch64), Python 3.11 built from source).

I'll play around with this in a container and put together a Dockerfile. Also, I have access to an M1 Mac Mini; I'll try things there too. Stay tuned

I can confirm that this works for Windows 11 and Python 3.11.

@umayerr
Copy link

umayerr commented Jun 24, 2024

@nanthony007 Try this: pip install "nmslib @ git+https://github.com/nmslib/nmslib.git/#subdirectory=python_bindings"
Regarding architecture, I'm using an Intel Mac. I'm using python@3.11 from Homebrew. Also the master-branch nmslib build works for me on Linux (Ubuntu 20.04 (aarch64), Python 3.11 built from source).
I'll play around with this in a container and put together a Dockerfile. Also, I have access to an M1 Mac Mini; I'll try things there too. Stay tuned

I can confirm that this works for Windows 11 and Python 3.11.

Unfortunately, this solution isn't working for my Intel machine. I'm running Debian 12 with Python 3.11. Has anyone tested this on Linux?

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> nmslib

@dakinggg
Copy link
Collaborator

hey @nanthony007 and @umayerr, could you try installing with mamba as per #520 (comment)? I'm looking to see if it works for others.

@nanthony007
Copy link
Author

Unfortunately I'm not in the position to let one package dictate my package manager selection so mamba will just be a pass/no for me and my use case. Thanks for the follow up on this though, I've resulted to just isolating scispacy processes into completely separate VMs from the other services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants