-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy 1.19.2 incompatible with gensim 4.1.0 #3226
Comments
Same error here, any plans to solve it soon? |
I also have the same issue. |
Thanks all for reporting. Possible duplicate of #3097. @mpenkov I thought we fixed this in #3095 (Gensim 4.0.1). Is this a regression? @subbuvidyasekar upgrade Gensim, the old 3.8.3 version is no longer supported. |
Hard to say at this stage. The previous fix may have been incomplete (e.g. #3097 (comment)). We'll need to investigate. |
Maybe something to do with conda? I see that mentioned above – it's like a separate distribution center, so god knows what they do (or don't do) with numpy. We only support standard packages (PyPI, @martinobertoni can you try replicating without conda? Using just |
Hi and thanks for looking into this
|
found a fix, let me know if it works for you:
this apparently recompile stuff using whatever numpy on the system |
Thanks @martinobertoni . According to https://github.com/scipy/oldest-supported-numpy/blob/master/setup.cfg#L46 , the binary wheel for Gensim on Python 3.7 should be compiled against numpy 1.14.5. Is that correct @mpenkov ? And since 1.14.5 < 1.19.2 (backward compatible), the wheel should work… but doesn't. I forgot what the numpy kerfuffle was, they changed their binary compatibility somehow. I'd have to re-read #3095 and #3097. EDIT: @mpenkov it looks like with 4.0.0, we had to do a quick bugfix release 4.0.1 because the |
Yeah, I've also purged that information from my mind since the previous release, so it'll be a fresh start for me too. I'll try to cut out some hours during the weekend (or next week) and sit down to deal with this. |
Another report: https://twitter.com/nutniti1/status/1433702323335749637 Looks like 4.1.0 affected many people. |
Yes, I can confirm. 4.0.1 works fine on Python 3.8 on Ubuntu, however 4.1.0 fails. The Numpy version is 1.21.2. |
Thank you everyone for your patience. I think I've tracked down the problem. Our github actions workflow (a recent contribution 959f2dd) was not using oldest-supported-numpy. I think this is what caused the problem. Can you please confirm that the wheels built off the most recent develop head work? |
Unfortunately, my efforts did not help resolve the issue. I can still reproduce the problem, even with the newly built wheels. I'm not sure what's going on here. We're definitely using oldest-supported-numpy to build the wheels:
(from https://github.com/RaRe-Technologies/gensim/runs/3545153699?check_suite_focus=true#step:6:109) but it doesn't seem to be helping:
|
Yeah that's weird. Does it work with numpy 1.14.5 (used during wheel building here), instead of numpy 1.19.2? |
No, there's a different problem there:
Installing gensim explicitly requires a more recent version of numpy Could this be the cause of the problem? It's a bit late here, I might have to revisit this in the next couple of days with a fresh head. |
I don't that's a problem; 1.19.2 >= 1.17.0. Unless someone has an idea, we might have to open a ticket at numpy again, ask what's going on. But seeing as we had the same problem with 4.0.0 (fixed in 4.0.1), I suspect the same fix would help 4.1.0 too. @mpenkov do you remember what you did there? Since 4.0.1 still works by all accounts, even with numpy 1.19.2, I suspect the issue is somewhere on our side rather than numpy's. |
The windows wheel build was ignoring the oldest-supported-numpy dependency, so I made sure to explicitly install it prior to building the wheel. |
There are several potential problem points, in order of rabbit-hole depth:
The most likely by far is our build system (scripts we use for building wheels). It uses multibuild, which in turn uses manylinux. One thing to try would be to try reproduce the problem with the least number of moving parts:
If there's still a problem, we may eliminate multibuild and manylinux from the above list of problems. It would also be a step in the direction of producing a reproducible example to show to other people. If there's no problem, then something in our build system is messed up, and needs further investigation. I can't think of anything better at the moment, so I'll try the above and report back when I have results. If anyone else is willing to play along at home, then please be welcome to do so ;) |
@piskvorky @gojomo I've reproduced the problem locally, without using our CI/multibuild/multilinux. I think something is wrong with our setup.py and we need to look there. The steps are:
Can you please try reproducing this? build.sh #!/usr/bin/env bash
set -euxo pipefail
numpy_str="${1:-oldest-supported-numpy}"
rm -rf wheel-builder.env
virtualenv -p /usr/local/opt/python\@3.7/bin/python3.7 wheel-builder.env
source wheel-builder.env/bin/activate
python --version
pip --version
pip install "$numpy_str"
pip freeze
pip -v wheel .
rm -rf wheel-builder.env
test.sh
|
I've tried building against various versions of numpy locally (MacOS, Py3.7) and then testing with 1.19.2:
(the above is purely trial and error) This is surprising to me. According to the official numpy recommendation, code built against oldest-supported-numpy (1.14.5 in this particular case) should work with all future versions of numpy. The results above contradict that, so it may be worth raising this with the numpy guys. In the immediate sense, how about we build against 1.17.0 instead of oldest-supported-numpy? That would allow us to make a bugfix release quickly. @piskvorky @menshikh-iv @gojomo What are your thoughts? |
I also made some tests on Ubuntu with py3.7 and I'm +1 for using |
Do we build against numpy 1.7.0 for all Python versions, on all platforms? Or is there a better way to go? |
Logic from
|
If you wish to build and test against NumPy 1.14.5, you cannot use |
@PrimozGodec Do you happen to know if this updated constraint in #3236 allows the build to work on the new Apple processors (M1 etc)? (Or, if this doesn't do it, any ideas on the minimal build-params update that might help that work?) |
@gojomo it is something new to me, I noticed that the separate wheels for new Mac processors exist during this conversation. I googled a bit and it seems that the x86_64 that most of the packages provide are not compatible with these processors. Packages that build platform-specific wheels will slowly need to start building arm64 wheels. Currently, users of computers with new processors will need to build a package themself (when calling pip install package .tar.gz will be downloaded and the package will be built if libraries exist at the computer). |
@gojomo IIRC Gensim already builds aarm64 wheels – is that not enough? |
They are for Linux. We don't build them for MacOS yet. |
It may be as simple as just building them. The devil's always in the details, though, and once we start it's possible that previously unseen obstacles will come up. If there's demand, we can start looking into it for the next release. |
This solution worked for me too... |
I think so. |
I've never built wheels, nor do I have an M1 processor. But people with M1 processors have asked on the project discussion list, and M1 processors (& their imminent followups coming in newer Macs) are reported to have some exciting performance gains – so it'd be a good thing to get/confirm working, as soon as a capable dev with the right system/tools can do so. If |
[x] installs gensim==4.1.2 (stellargraph#2010 and piskvorky/gensim#3226) [x] fixes typo (?): notebook states training on "75%" but `train_test_split` ratio set to "0.1", which also affects final accuracy (85% new / 72% old) [x] sets `n_jobs=4` as default value in LogisticRegressionCV is resulting in non-convergence
I'm having this problem again with gensim 4.1.2 and python3.10 |
Can you be more specific about the exact |
@gojomo run the following command
|
Thanks for the 1-liner to replicate! It looks like this is a new combination-of-versions causing a similar binary incompatibility: As of Of course, we'd eventually want to fix this both for Python 3.10, and if at all possible more automatically handle normal Python & Another workaround possibility might be to force the local recompilation of Gensim binaries, rather than relying on a wheel that may be mismatched – via an option like the Relatedly: should perhaps Gensim's default installs go back to not including pre-compiled binaries, especially on OSes (like linux) where that proceeds relatively smoothly? We might thyen only rely upon the extra specificity imposed by binary wheels in situations, such as MSWindows installs, where the process needs that extra hand-holding. |
I think precompiled wheels are good, it helps users a lot. I'd prefer to keep them for Gensim. TBH I don't understand why numpy gives us so much trouble. I thought they created one binary incompatibility way back, which wreaked havoc downstream (incl. Gensim) at one point. But that that was exceptional. Instead, binary numpy incompatibilities look like a common occurrence now :( Python 3.10 is coming in the next Gensim release, including wheels. The release is imminent but I have no idea how that affects the numpy mess. I also don't know how conda is implicated (which we don't support). |
My impression was pre-wheels, installs were still pretty straightforward on Unixes (albeit slightly slower), & the biggest benefit has been sparing Windows users from confusing buildchain choices there. But wheels also seem to require foreseeing/prebuilding-for all these varied configs, to avoid these binary-format mismatches. I don't believe |
Yeah, I don't understand what the problem here is, either. If numpy continues to give us pain, though, perhaps we should present users with a more helpful error message when this happens? e.g. "Under Python {your_python_version}, gensim {gensim_version} requires numpy {minimum_numpy_version} or above, but your numpy version is {current_numpy_version}. Consider upgrading numpy" What do you think? |
Isn't it the opposite? According to the above, we build with |
@GuillemGSubies I cannot reproduce your situation: for me (on Ubuntu 20.04 with conda 4.10.3) that one liner succeeds in building gensim from source and then using it. I think you should open a new issue with the complete log of your build. If you do so, please ping me there. One theory for the failure might be that you have numpy 1.22 installed somewhere and the gensim build is picking it up, but it is hard to tell without the log (even then it may be difficult to debug). The message you see indicates that gensim is being built with numpy1.22 (the "Expected 96 from C header" part of the error), and being run with a numpy version 1.20 or 1.21 (the "got 88 from PyObject" part of the error). Here are the sizes of
|
I'm still getting this a year later with Python 3.8.8 and NumPy 1.20.1 on Monterrey 12.6 with an i7 processor. |
@conduit242 I discovered the same a few days ago and proposed a PR which will fix it #3467. The problem is that in version 4.3.1, they accidentally started to build wheels on the newest Numpy. Until they accept the PR, you can use Gensim 4.3.0, which should work. |
Problem description
When importing gensim I get the following error
Steps/code/corpus to reproduce
Versions
Linux-5.11.0-25-generic-x86_64-with-debian-bullseye-sid
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0]
Bits 64
NumPy 1.19.2
SciPy 1.7.1
The text was updated successfully, but these errors were encountered: