Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to handle numpy.distutils and setuptools interaction #2372

Open
rgommers opened this issue Sep 2, 2020 · 17 comments
Open

how to handle numpy.distutils and setuptools interaction #2372

rgommers opened this issue Sep 2, 2020 · 17 comments

Comments

@rgommers
Copy link

rgommers commented Sep 2, 2020

Hi setuptools devs, I'd like to use this issue to summarize issues in the interaction between setuptools and numpy.distutils, and then see if we can find a way to resolve those structurally.

The setuptools 50.0 release gave NumPy and SciPy (a heavy numpy.distutils users) a number of concrete problems:

All of the above are resolvable. The bigger issue here is the interaction between distutils, numpy.distutils and setuptools. Due to setuptools release cadence, such breaks will keep on happening due to incompatibilities between numpy.distutils and setuptools. Let me summarize the issue, and then some options for dealing with it.

Old situation, for setuptools <50.0:

  1. numpy.distutils extends and monkeypatches distutils.
  2. the import order in NumPy and SciPy setup.py is:
import setuptools  # unused, just to let setuptools do its monkeypatching dance first
from numpy.distutils.core import setup

This situation worked reasonably well, because:

  • distutils moved slowly
  • larger distutils changes typically showed up in pre-releases of new Python versions, or in bugfix releases (say 3.7.7) that vendors like Anaconda qualify before they ship it, so we could adjust numpy.distutils to those changes
  • setuptools didn't do all that much that was relevant build-wise

New situation, for setuptools 50.0:

(default situation, no env var set)

  1. setuptools replaces distutils with its vendored setuptools._distutils, even if plain distuils is imported first
  2. numpy.distutils is unchanged, so it still extends and monkeypatches distutils - which is now setuptools._distutils

So numpy.distutils finds itself monkeypatching setuptools code all of sudden, and that
setuptools code includes patches from Python 3.9-dev that are therefore now released into the wild with new setuptools releases without any alpha/beta/QA trajectory like we had before.

The releasing new patches quickly without testing will still be an issue for numpy.distutils probably, even if the SETUPTOOLS_USE_DISTUTILS="local" behaviour gets reverted for the time being (which seems likely, since there's a ton of other issues right now, like the Debian breakage).

What now?

Longer-term I don't think it's feasible for numpy.distutils to work as it does today and extend setuptools; the release cycle mismatch will be too much of a problem. I'm not quite sure what the best solution for that is though. Options could include:

  • for every NumPy release, add a hard setuptools <= current_setuptools_version constraint (NumPy releases get supported for 2.5 years by most of the scientific ecosystem though (i.e. ~50% of Python's user base), so that'll mean pretty old setuptools versions for building, e.g., wheels of packages that rely on NumPy - example: scikit-learn needs to build with NumPy 1.13.3 right now, so it would get setuptools 36.5)
  • make numpy.distutils a standalone package on top of setuptools (then at least release cycle can be matched, but situation remains unstable)
  • integrate numpy.distutils into setuptools (more testing will then be done before setuptools releases and no extending/monkeypatching is needed, but is it a good idea knowledge and work load sharing wise?)

Let me emphasize that I do see the upsides in merging disutils into setuptools, both design and maintenance wise. And we (NumPy & scientific packages) have been burned in the past by distutils patches going unmerged for ages, so setuptools being better maintained is great.

Also, for context, it may be useful to list what numpy.distutils adds besides carrying around needed distutils patches:

  • Fortran build support
  • SIMD intrinsics support
  • BLAS/LAPACK library support (OpenBLAS, MKL, ATLAS, Netlib LAPACK/BLAS, BLIS, 64-bit ILP interface, etc.)
  • Support for a few other scientific libraries, like FFTW and UMFPACK (less often used)
  • Better MinGW support
  • Per-compiler build flag customization (e.g. -O3 and SSE2 flags are default)
  • EDIT: a simple user build config system, see site.cfg.example

Looking forward to hearing your thoughts on this topic.

@bashtage
Copy link

bashtage commented Sep 2, 2020

It also resulted in build Failure on Windows for statsmodels, see statsmodels/statsmodels#7016.

@zooba
Copy link
Contributor

zooba commented Sep 4, 2020

Just to add some extra info (which I already posted on one of the related threads, but it belongs here):

The distutils/setuptools merge was done with the full blessing of the core CPython team, and we plan to deprecate (in 3.10) and remove (in 3.12) distutils from the standard library completely. The specific versions may vary (I'm writing the PEP now), but the overall plan is uncontroversial.

I don't think we'd have any concerns if numpy.distutils also took a copy of the current distutils code, or one of the other options. Bear in mind that it should be nearly feasible to put up the build tool as its own package and use PEP 517 to bring it in, which could remove setuptools completely from your equation (though I am very much aware of the other issues that numpy et al. face with pip's implementation of PEP 518).

Best of luck sorting this out! Sorry that it showed up as failures like this.

@jaraco
Copy link
Member

jaraco commented Sep 4, 2020

I have not looked into the details, but my instinct is that a combination of some options would be best:

  • adapt distutils patches to be generally useful (or selectively enabled) and contribute those to pypa/distutils or setuptools.
  • provide setuptools extensions to implement additional functionality where appropriate; setuptools could add extendable hooks if that helps.
  • rely on PEP-517 to declare supported versions of Setuptools to govern the speed of adopting changes.

To be sure, the plan is for setuptools not to expose 'distutils' long-term. Soon after it can safely own the code, it will deprecate imports of distutils and present its own imports of the needed interfaces (i.e. distutils.core.setup -> setuptools.setup, etc), so whatever we can do to support building numpy/scipy through long-term interfaces would be preferable.

* [numpy/numpy#17209](https://github.com/numpy/numpy/pull/17209), CI break in NumPy, `from distutils import sysconfig` broken on TravisCI

I don't understand the failure here. Would you consider filing a bug with this, either with setuptools or pypa/distutils, especially if you have a way to replicate the failure?

@rgommers
Copy link
Author

rgommers commented Sep 4, 2020

I don't think we'd have any concerns if numpy.distutils also took a copy of the current distutils code, or one of the other options. Bear in mind that it should be nearly feasible to put up the build tool as its own package and use PEP 517 to bring it in, which could remove setuptools completely from your equation (though I am very much aware of the other issues that numpy et al. face with pip's implementation of PEP 518).

I'm not worrying too much about how to bring the build tool in. We will have to support pip install <package_name> and python setup.py develop at least (pip's editable installs don't quite cut it). And we have thousands of of downstream users of numpy.distutils, so we can't just switch to a completely different method like scikit-build or Bento.

Hence the choice is indeed to either vendor distutils, or have a dependency on setuptools. I'd prefer the latter, because the non-distutils part of setuptools does offer some functionality that people want and rely on, and syncing distutils patches between setuptools and our vendored copy would also be a pain.

I have not looked into the details, but my instinct is that a combination of some options would be best:

Makes sense.

* adapt distutils patches to be generally useful (or selectively enabled) and contribute those to pypa/distutils or setuptools.

I assume you're not interested in adopting any of the main features of numpy.distutils I listed, except for better MinGW support?

Maybe better CPU feature detection makes sense too?

* provide setuptools extensions to implement additional functionality where appropriate; setuptools could add extendable hooks if that helps.

* rely on PEP-517 to declare supported versions of Setuptools to govern the speed of adopting changes.

The one annoyance there is that, unless we split out numpy.distutils into its own package, we don't want to add a runtime dependency on setuptools, hence declaring those supported versions will be a matter of putting it in the NumPy release notes and manually adding it to pyproject.toml of every downstream user. Maybe that is a good reason to do that splitting off into a separate package.

To be sure, the plan is for setuptools not to expose 'distutils' long-term. Soon after it can safely own the code, it will deprecate imports of distutils and present its own imports of the needed interfaces (i.e. distutils.core.setup -> setuptools.setup, etc), so whatever we can do to support building numpy/scipy through long-term interfaces would be preferable.

I think the details of that plan will be very useful to figure out what to do here. For example:

  • Will all of distutils.command be merged into setuptools.command mostly unchanged (e.g. command.config is missing in setuptools now)?
  • Will compiler support go into setuptools submodules and will you keep all of it with the distutils names (e.g. distutils.msvc9compiler -> setuptools.msvc9compiler and get rid of setuptools/msvc.py)?
  • Do you have an estimate for timeline? If it's a few months we can simply wait till the dust settles, and then adjust based on the new shape things have taken; if it's >1 year we may be adjusting while you are migrating things, which could be more complicated.

I don't understand the failure here. Would you consider filing a bug with this, either with setuptools or pypa/distutils, especially if you have a way to replicate the failure?

Looks like the cause is distutils.sysconfig having moved to sysconfig in the stdlib, but not completely - so now we need pieces of both. Work ongoing in numpy/numpy#17223 to sort it out on the NumPy end, we'll open an issue if there's a problem left after that's done.

@jaraco
Copy link
Member

jaraco commented Sep 5, 2020

I assume you're not interested in adopting any of the main features of numpy.distutils I listed, except for better MinGW support?

Maybe better CPU feature detection makes sense too?

If the behaviors are generally valuable and can be implemented in a way that's not disruptive of supported use-cases, I've no objection to incorporating any number of features.

@jaraco
Copy link
Member

jaraco commented Sep 5, 2020

I think the details of that plan will be very useful to figure out what to do here.

I agree these are good questions. My plan was to address issues like these incrementally, as needed. First step will be creating suitably-compatible versions of public interfaces entirely in the setuptools namespace and weaning users and packages off of import distutils*.

  • Will all of distutils.command be merged into setuptools.command mostly unchanged (e.g. command.config is missing in setuptools now)?

Almost certainly.

  • Will compiler support go into setuptools submodules and will you keep all of it with the distutils names (e.g. distutils.msvc9compiler -> setuptools.msvc9compiler and get rid of setuptools/msvc.py)?

Maybe. Here we'll need to explore what interfaces the users need for these modules. Ultimately, I'd like to consolidate a lot of these behaviors, but it may be necessary to maintain some legacy interfaces. More planning and design is needed here.

  • Do you have an estimate for timeline? If it's a few months we can simply wait till the dust settles, and then adjust based on the new shape things have taken; if it's >1 year we may be adjusting while you are migrating things, which could be more complicated.

I was hoping O(weeks) to have distutils adopted, but it's proven more difficult (mostly due to system package manager patches), and it's not obvious to me how fast that blocker can be cleared. After full adoption is the norm, I expect to perform a refactoring every few weeks. I think it's possible to take more than 1 year, but more likely 6-9 months would be my guess.

@rgommers
Copy link
Author

rgommers commented Sep 5, 2020

If the behaviors are generally valuable and can be implemented in a way that's not disruptive of supported use-cases, I've no objection to incorporating any number of features.

Thanks @jaraco, that helps. For now I won't bother you to think about things like Fortran compiler support or linear algebra libraries, but it's good to know you're open to new features if they can be fit in in a clean, non-disruptive way.

I think it's possible to take more than 1 year, but more likely 6-9 months would be my guess.

Given our (low) bandwidth for working on numpy.distutils and difficulty in testing N1 platforms x N2 compilers x N3 linalg libraries, I'm inclined to wait those 6-9 months and just keep an eye on how things go.

More planning and design is needed here.

If you need input from the NumPy side on particular design decisions or on a design document, please feel free to ping me any time.

@mattip
Copy link
Contributor

mattip commented Sep 5, 2020

@pv
Copy link

pv commented Sep 8, 2020

One comment about vendoring: it probably would not be sufficient for numpy.distutils to vendor only distutils, as IIRC things such as proper pip/wheel/MSVC support comes from setuptools. This relies on setuptools monkeypatching distutils command/compiler framework which presumably now is on the table for refactoring and probably contains brittle things that refactoring can break, so mixing "frozen" distutils and new setuptools eventually stops working? If so, it seems vendoring would imply forking distutils+setuptools and keeping the forks on zombie life support. (I'm not sure how this plays together with pip import setuptools.)

Maybe such forks can be kept frozen for a long time? I'm not sure how many distutils/setuptools fixes are essential to keep things building on new Python releases. This also reflects in how long Numpy and other packages depending on numpy.distutils can continue pinning to the pre-50 setuptools version.

(The above point may have some relevance also for the discussion about removal of distutils from stdlib, as existing setup.py may rely both on setuptools and "old" distutils features to work together properly. But this discussion probably should be continued elsewhere.)

Adapting numpy.distutils to a public API that a refactored setuptools provides would be simpler in the long run once we get there. However, in this case keeping close to 100% backward compatibility for existing setup.py files sounds challenging, especially if the distutils refactoring is significant.

For integrating numpy.distutils features to setuptools: most projects using numpy.distutils probably mainly need the Fortran compiler support and features associated with that, and not much else. However, as with distutils, if backward compatibility is going to be broken, there are quirks that should be be ironed out in the functionality and sorting that out takes time.

@jaraco
Copy link
Member

jaraco commented Jan 23, 2021

Given our (low) bandwidth for working on numpy.distutils and difficulty in testing N1 platforms x N2 compilers x N3 linalg libraries, I'm inclined to wait those 6-9 months and just keep an eye on how things go.

In that case, should Setuptools consider NumPy a non-blocker for making SETUPTOOLS_USE_DISTUTILS=local the default (requiring numpy builds to either override the value to stdlib or otherwise avoid those releases)? I'm okay with that, and allows Setuptools to focus on the Debian/Fedora patches to arrive at a solution exclusive of NumPy and proceed with adoption.

@rgommers
Copy link
Author

Thanks for asking @jaraco. NumPy itself is already pinning to <49.2.0, and the latest SciPy release pins to <= 51.0.0. I think all projects should start doing this - keeping latest setuptools in CI for as long as possible, while pinning setuptools in their releases. I suspect most scientific libraries don't do that right now in their pyproject.toml, so a pip install pkgname --no-binary may break. But long-term that seems inevitable anyway, and it doesn't affect many end users given that there are wheels for all common platforms. So I'd say just go ahead.

@isuruf
Copy link
Contributor

isuruf commented Dec 22, 2021

@jaraco, what's your feeling on moving the Fortran compiler support from numpy.distutils to distutils?

Also, can you move this issue to pypa/distutils repo?

@rgommers
Copy link
Author

I'd prefer not to move this issue - this is between the Setuptools and NumPy projects, so having cross-linked issues between those two projects seems right to me. For plain distutils this is kind of out of scope.

@jaraco, what's your feeling on moving the Fortran compiler support from numpy.distutils to distutils?

xref @jaraco's earlier answer: #2372 (comment). Would be good to know if that changed in the meantime, but I'd expect that it didn't.

@pradyunsg
Copy link
Member

One thing I will mention (since confusion around this has been stated): pip will always import setuptools before running setup.py. Thus, the positioning/ordering of import setuptools in a setup.py doesn't matter.

https://github.com/pypa/pip/blob/0a21080411c25acfb87fbc380631806e0477d7d3/src/pip/_internal/utils/setuptools_build.py#L5-L46

@rgommers
Copy link
Author

I see that the change in plans for NumPy hasn't yet been posted here, so let me do so now. numpy.distutils is deprecated, and will go away for Python releases where plain distutils goes away. Users can migrate to another build system, or help add the feature(s) they need to setuptools. See https://numpy.org/devdocs/reference/distutils_status_migration.html

@pradyunsg
Copy link
Member

pradyunsg commented Sep 27, 2022

Is there any timeline for the plans to move numpy itself away from trying to use setuptools < 60 as its build system?

@rgommers
Copy link
Author

The planned timeline is "by the time we need it for Python 3.12", because we kinda have to. It's still a big job though, so it depends on when we can make some dedicated time for the right person(s).

I made a start in https://github.com/rgommers/numpy/tree/meson, and the configure checks turned out to be a lot easier than with disutils. Compiler support is mostly figured out too, because that's common with SciPy. The main sticking point will be SIMD support, see numpy.distutils.ccompiler_opt and this diagram in the docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants