Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip install of version 0.24 is broken for platforms without wheels #25193

Closed
mgoldey opened this issue Feb 6, 2019 · 27 comments
Closed

Pip install of version 0.24 is broken for platforms without wheels #25193

mgoldey opened this issue Feb 6, 2019 · 27 comments
Labels
Build Library building on various platforms Dependencies Required and optional dependencies Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@mgoldey
Copy link

mgoldey commented Feb 6, 2019

Code Sample, a copy-pastable example if possible

Easiest way to reproduce this is in a fresh docker container, as below

docker run --rm -it alpine:latest
$ apk update
$ apk add build-base python3 python3-dev py3-pip
$ pip3 install pandas
Collecting pandas
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    100% |████████████████████████████████| 11.8MB 4.2MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 346, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-y3ff_xpr/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-y3ff_xpr/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1136, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 348, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

Problem description

Note that pip installing numpy and then pandas does work

pip3 install numpy
pip3 install pandas

But the standalone command is broken.

pip3 install pandas

Unfortunately, this means that a requirements.txt file is insufficient to for setting up a new environment with pandas installed (like in a docker container).

@mgoldey
Copy link
Author

mgoldey commented Feb 6, 2019

I believe this to be a function of recent changes because pandas==0.23.1 does install numpy correctly.

@gfyoung gfyoung added Build Library building on various platforms Dependencies Required and optional dependencies labels Feb 7, 2019
@gfyoung
Copy link
Member

gfyoung commented Feb 7, 2019

cc @TomAugspurger

This is really weird, as doing pip install pandas (without numpy installed) seems to be just fine...

@jorisvandenbossche
Copy link
Member

@gfyoung I suppose you tried with binary wheels?

I can reproduce this with, installing the following in a fresh environment with just python and pip:

(test-pip) joris@joris-XPS-13-9350:~/scipy$ pip install pandas --no-binary :all:
Collecting pandas
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    100% |████████████████████████████████| 11.8MB 1.7MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 357, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-3txnb8eh/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-3txnb8eh/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1142, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-3txnb8eh/pandas/

For pandas 0.23.4 it indeed works.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Feb 7, 2019
@jorisvandenbossche
Copy link
Member

From a quick look, this might be caused by the changes in #21879 (which I don't fully understand though), as it is caused by maybe_cythonize being ran in setup.py before setuptools is able to check and install dependencies.

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added this to the 0.24.2 milestone Feb 7, 2019
@jorisvandenbossche jorisvandenbossche changed the title pip installation has broken dependency handling for numpy Pip install of version 0.24 is broken for platforms without wheels Feb 7, 2019
@snordhausen
Copy link

@jorisvandenbossche You are right, this is the change that broke things. I checked with this Dockerfile:

FROM alpine:3.8
RUN apk update
RUN apk add g++ libstdc++ python3-dev bash git 
RUN pip3 install --upgrade pip

# With the change
#RUN pip3 install numpy==1.16.1 git+https://github.com/pandas-dev/pandas@df2ccef
# 1 commit before:
RUN pip3 install numpy==1.16.1 git+https://github.com/pandas-dev/pandas@f11738b

@jbrockmendel
Copy link
Member

maybe_cythonize wraps cython's cythonize, so we may need to get a fix upstream. Shorter-term, I suppose there is something that happens within setuptools.setup that installs numpy (?) that we could shoe-horn into the top of maybe_cythonize

@jorisvandenbossche
Copy link
Member

What I understand from the diff in #21879 is that before, we just passed extensions to the setup(ext_modules=...), and now we already call cythonize (or the maybe wrapper) on them inside setup.py.
So my question then: what is exactly the reason that we started to use cythonize ? Or how do other projects deal with it?

@jbrockmendel
Copy link
Member

what is exactly the reason that we started to use cythonize ?

cythonize is recommended way to compile cython modules. A lot of the dependency tracking we used to do manually in setup.py cythonize handles automatically. cythonize also takes care of figuring out which files need to be re-compiled vs are re-usable (I know statsmodels used to do this manually, not sure about pandas)

@jorisvandenbossche
Copy link
Member

But this cythonize should only be called when we actually want to compile cython sources? While for an sdist, we already cythonized and you only want to compile the C sources. So should we have some conditional somewhere in there? Related to that, isn't this what the custom cmd_class is for? (to cythonize in the _build_ext)

I suppose there is something that happens within setuptools.setup that installs numpy (?) that we could shoe-horn into the top of maybe_cythonize

Can you detail this thought a bit more?

@jbrockmendel
Copy link
Member

I suppose there is something that happens within setuptools.setup that installs numpy (?) that we could shoe-horn into the top of maybe_cythonize

Can you detail this thought a bit more?

I don't know what happens within setuptools.setup, but am speculating that there is something like:

def setup(...):
    install_declared_dependencies(...)
    everything_else(...)

So the thought was that if maybe_cythonize being called before setup was a problem because dependencies were not installed, this might be fixable by calling install_declared_dependencies at the top of maybe_cythonize.

Again, pure speculation.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 7, 2019 via email

@jorisvandenbossche
Copy link
Member

So the thought was that if maybe_cythonize being called before setup was a problem because dependencies were not installed, this might be fixable by calling install_declared_dependencies at the top of maybe_cythonize.

I don't think we want to start meddling with how setuptools installs dependencies. It seems our use of the custom cmd_class made this work before (the check for where numpy's include path is also done in there), so maybe we will need to revert the use of cythonize? (or at least make it conditional on the presence of already compiled c files)

We should probably also look again into PEP517/518 support with a pyproject.toml, as it seems that support for that in pip has improved a lot since last time we tried (and reverted).

We also need to update the release infra or testing to ensure that the pip test environment does not have Cython installed.

Yes, was thinking the same. But didn't we already have a test for pip installing from source in https://github.com/pandas-dev/pandas-ci ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 7, 2019 via email

@xhochy
Copy link
Contributor

xhochy commented Feb 8, 2019

For declaring build time requirements there is the pyproject.toml nowdays: https://www.python.org/dev/peps/pep-0518/ It behaves very similar to setup_requires except that pip can also deal with. I think setup_requires installed dependencies through distutils. pyproject.toml specified build requirements are installed before setup.py is called.

I'll submit a PR.

@jorisvandenbossche
Copy link
Member

@xhochy yes, we did that before, but reverted it after lots of troubles (#20775, #20718). But as I said above, we should probably look into it again, since it seems that most of the problems we had last year are more or less solved in pip nowadays. I am only not fully sure if we should do it for a bug-fix release though (we could still fix the setup.py for 0.24.x, and then try out the pyproject.toml again for 0.25)

@xhochy
Copy link
Contributor

xhochy commented Feb 8, 2019

@jorisvandenbossche I will dig a bit deeper into the pyproject.toml problems and see if I can get it working. There are some issues and lengthy bug reports I'm currently scanning but I like this approach and would love to see it working.

@jorisvandenbossche
Copy link
Member

Yes, for sure we should try to add it again (and thanks for looking into it!). I am mainly wondering if we should see it as a fix for this issue (for 0.24.x).

@jorisvandenbossche
Copy link
Member

@jbrockmendel would you have time to look into this? I might think that with some conditional check before calling the cythonize (whether both cython and numpy are installed, like we now already have a check that cython is installed), it might work

@jbrockmendel
Copy link
Member

@jorisvandenbossche sure. Is the desired behavior just to check if numpy is installed and if not raise?

@jorisvandenbossche
Copy link
Member

Not sure if it should raise. I think you should be able to call python setup.py .. without having numpy installed. So eg doing a python setup.py egg_info should not call cythonize / require numpy or cython to be installed (as it was before #21879)

@jbrockmendel
Copy link
Member

So eg doing a python setup.py egg_info should not call cythonize / require numpy or cython to be installed

OK. The fix that comes to mind is a check inside maybe_cythonize to skip cythonizing based on the command line arguments. At the moment we only skip for clean, but that could be extended to egg_info or whatever else (or the check could be reversed so cythonize is only run for build_ext or whatever)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 13, 2019

I don't know all the possible commands of setuptools enough to know if specifically checking for that would be enough. Best might be to simply test it out.

or the check could be reversed so cythonize is only run for build_ext or whatever

In any case, it seems to me that there is some duplication now in the setup.py, as we have both functionality to cythonize in the build_ext class and in the maybe_cythonize you added. But not familiar enough with it to know how it interacts with the coverage that you were trying to solve.

@jbrockmendel
Copy link
Member

But now familiar enough with it to know how it interacts with the coverage that you were trying to solve.

IIRC the coverage implementation isn't quite orthogonal to the usage of cythonize but close. I'm fairly confident that if we had to remove the usage of cythonize, we could do it without removing the coverage implementation.

In any case, it seems to me that there is some duplication now in the setup.py, as we have both functionality to cythonize in the build_ext class and in the maybe_cythonize you added.

Some of that code is shared. maybe_cythonize calls build_ext.render_templates. I guess the code adding numpy_incl to each ext.include_dirs could be shared. Do you see anything else that could be de-duplicated?

I don't know all the possible commands of setuptools enough to know if specifically checking for that would be enough

This is partially an upstream problem with cythonize. I implemented maybe_cythonize to wrap cythonize because python setup.py clean would incorrectly call cythonize if we didn't do a check. Unless you want to revert the use of cythonize (I really hope that's not on the table), I'm pretty sure the place to fix this issue are on line 476 in maybe_cythonize where it currently checks for "clean". We should also push the fix upstream (cython/cython#1495).

I also don't know the setuptools API. It shouldn't be that hard to find a complete-ish list of the possible commands and sort them by whether cythonize should be called or not, should it?

@jorisvandenbossche
Copy link
Member

Unless you want to revert the use of cythonize (I really hope that's not on the table)

Well, I don't care that much how it is done, I mainly care about it being fixed for 0.24.2. I only know that before #21879 it was working, and that I don't have the time to dive into setuptools inner details right now, so if we don't find another solution, reverting it is an option for me.
In any case, the maybe_cythonize is not essential to get a build_ext working. I just removed it, and doing a dev build on master still works fine (but I assume the need for the maybe_cythonized is somehow related to the coverage?).

But the suggestion about checking which setup command is ran, might make sense (at least, eg scipy does things like that: https://github.com/scipy/scipy/blob/master/setup.py)

This is partially an upstream problem with cythonize.

Or you could also say that their recommendation to use cythonize is not a very good one, as mentioned in the last comment on the issue you linked (in their docs, they actually also mention some ways to avoid calling cythonize in certain cases). We already have all the custom build classes (CleanCommand, build_ext, ..) that deal with this, and the cythonize/maybe_cythonize seems to be duplicating that partly (but again, I don't have much knowledge of setuptools/cython interaction)

@jbrockmendel
Copy link
Member

But the suggestion about checking which setup command is ran, might make sense

I maintain this is the thing to do. Go for it.

In any case, the maybe_cythonize is not essential to get a build_ext working. I just removed it, and doing a dev build on master still works fine (but I assume the need for the maybe_cythonized is somehow related to the coverage?).

cythonize takes care of figuring out the dependencies that were previously explicitly (and error-proned-ly) listed in setup.py. Using it simplifies that part of setup quite a bit.

@jorisvandenbossche
Copy link
Member

But the suggestion about checking which setup command is ran, might make sense

I maintain this is the thing to do. Go for it.

I currently don't have time to look more into it.

@TomAugspurger
Copy link
Contributor

I'll look into this today, as I think this is a blocker for 0.24.2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Dependencies Required and optional dependencies Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants