Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #2478: Topological installation order #2616

Merged
merged 3 commits into from
Apr 1, 2015

Conversation

rbtcollins
Copy link

This fulfils the request in Issue #2478 - install dependencies in topological order.

The finished order is:

  • anything install_require'd before the requiring thing
  • as given for user supplied requirements where compatible with the install-requiring constraint

Examples:
pip install foo bar, where foo depends on bar will install bar, then foo.
pip install foo bar, where foo depends on baz will install baz, foo, bar.

@xavfernandez
Copy link
Member

I don't understand why we need a topological ordering of installs since it has been agreed that setup_requires won't end up in the target environment ?

name='TopoRequires4',
version='0.0.1',
packages=['toporequires4'],
install_requires=['TopoRequires2', 'TopoRequires3', 'TopoRequires4'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per your readme update, the 3rd item should be "TopoRequires", right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuts,yes.

@rbtcollins
Copy link
Author

@xavfernandez this has nothing to do with setup_requires. Topological sorting is an existing feature request predating my work on setup_requires, but which increases the reliability of building and installing things - because there are setup.py's where egg_info can be run, but install/bdist_wheel etc can fail if the dependencies have not been installed. Because we can't have nice things. This is why folk have had to tweak install orders in the past, because pip was installing in 'first-found' or 'last-found' order, neither of which solve this generally.

@xavfernandez
Copy link
Member

there are setup.py's where egg_info can be run, but install/bdist_wheel etc can fail if the dependencies have not been installed.

This is the goal of setup_requires.

I'm afraid implementing topological ordering will send this message to users: "You don't need to declare setup_requires dependencies, pip should figure it for you".

@rbtcollins
Copy link
Author

I don't dispute that. But if the pip committers didn't want topological sorting they would have closed that issue WONTFIX, right?

@dholth
Copy link
Member

dholth commented Mar 31, 2015

+1 on toposort. No one should have to manually order their dependencies.

@xavfernandez
Copy link
Member

+1 on toposort. No one should have to manually order their dependencies.

No one has. People only have to use the tools correctly (i.e. use setup_requires if their setup.py requires something special).

This is even less useful/desirable with wheel-caching coming: it will be much less painful to have numpy as setup_requires and install_requires, since it will only be built once as wheel.

@dholth
Copy link
Member

dholth commented Mar 31, 2015

Topological sorting only means that all of a package's requirements get installed before that package is installed. It does not have anything to do with setup requirements.

@qwcode
Copy link
Contributor

qwcode commented Mar 31, 2015

It does not have anything to do with setup requirements.

practically speaking, it does. to say again, this PR intends to resolve #2478, which was motivated by a user wanting a topo sort for the example of "h5py" needing "cython" to run a build extension. A topo install is not needed if "h5py" setup-requires "cython". Although much of the conversation for "setup-requires" has focused around wanting to do module level imports in setup.py itself, this scenario of needing requirements in place for build extensions is something that is literally mentioned in the setuptools docs: " [setup_requires] is needed if you are using distutils extensions as part of your build process".

@pfmoore
Copy link
Member

pfmoore commented Mar 31, 2015

To put it another way: Any given pip installation should be seen as atomic - everything requested (and their dependencies) gets installed at once, or not at all. The order pip does the tasks it needs to to achieve that is an implementation detail, and should not matter. If it does, that is a bug in the packages and how they have specified their requirements. All the user should have to care about is that when they do pip install X Y Z, something happens and when it is finished, X, Y and Z are available and work (in terms of all their dependencies are present). Or there's an error and nothing gets installed.

@pfmoore
Copy link
Member

pfmoore commented Mar 31, 2015

... which is not to say that this change won't help users deal with broken package specs, but that's all it does.

On its own, this is a fix for a regression vs master, as it passes
tests. It is needed for topological handling, as we need to build a
dependency graph, and can't do that without resolving unnamed -> named
dependencies before adding the depended-on requirements.
This is for reuse in the declarative requires patch.
@rbtcollins
Copy link
Author

Ok, so I think I've figured out some of the cognitive dissonance here. Coming from a distro background it didn't even occur to me until I got up this morning. Package managers like apt and yum (and their deeper plumbing, dpkg and rpm) try very hard to avoid leaving the users system in a broken state. pip also has this goal - see for instance @dstufft 's requirement that we don't install anything until we know we can resolve all the dependencies satisfactorily.

The reason for topological sorting then, is not so that buggy setup_requires will work, but correctness: we should not have packages installed without their dependencies. topological sorting is a first approximation to this goal.

A second, closer approximation would involve attempting to maintain axioms across all installed packages e.g. when A and B both have versioned deps on C, don't allow an upgrade or install or B if that would break the versioned dep A holds, even transiently.

This can be made easier by a third thing - atomic replacement of packages, which requires making the installation and removal of packages atomic, which its not today. But I don't want to get into perfection here; acting in topological order is clearly much closer to 'all axioms maintained at all times' than installing in either first-discovered or last-discovered orders.

@dstufft
Copy link
Member

dstufft commented Mar 31, 2015

In particular, a problem that happens is if you do pip install thing-that-depends-on-foo, and you install thing-that-depends-on-foo before you install foo, then if foo fails to install for whatever reason (perhaps missing dev headers?) you cannot repeat the instal command to get it to install everything else because thing-that-depends-on-foo is already installed.

There are of course other ways of handling that, like making pip install <whatever> re-evaluate the dependencies to make sure there are no missing dependencies (which isn't itself a bad idea), but installing things in this order just makes sense I think.

An option to be more "atomic", is to have the other PR that @rbtcollins is working on (the auto build wheel PR) build all the wheels first, before installing any wheels. That would move us into a situation where we ensure we can build everything before we attempt to install anything. That isn't particularly relevant to this PR though, which I think is an entirely reasonable thing to do.

@rbtcollins
Copy link
Author

@dstufft actually we re-evaluate the whole dep chain 'pip install thing-with-missing-dep' will install the missing dep. But its better if thing-with-missing-dep not be installed at all, of course :)

@xavfernandez
Copy link
Member

In particular, a problem that happens is if you do pip install thing-that-depends-on-foo, and you install thing-that-depends-on-foo before you install foo, then if foo fails to install for whatever reason (perhaps missing dev headers?)

Thanks, that's indeed a good justification for this PR. I thought pip was able to deal with that in an atomic way and rollback the whole thing.

Indeed the solution to build all the wheels first before installing anything seems like an ideal goal 👍

This is needed for setup-requires, since without it its possible
to cause installation to fail in sort-circuit scenarios such as
the added functional test case demonstrates.
@qwcode
Copy link
Contributor

qwcode commented Mar 31, 2015

a problem that happens is if you do pip install thing-that-depends-on-foo,
and you install thing-that-depends-on-foo before you install foo,
then if foo fails to install for whatever reason (perhaps missing dev headers?)
you cannot repeat the instal command to get it to install everything
else because thing-that-depends-on-foo is already installed.

what? pip handles this now. pip will see thing-that-depends-on-foo is already installed, but still proceed to install foo

@dstufft
Copy link
Member

dstufft commented Mar 31, 2015

I'm remembering some of the details wrong, the issue was #2151 but I don't recall what exactly it was now... but I remember that reversing the installation order so that dependencies got installed before the thing that depended on them solved it.

@qwcode
Copy link
Contributor

qwcode commented Mar 31, 2015

we should not have packages installed without their dependencies

more generally, we shouldn't partially fulfill a user's install request. all or nothing, i.e. support rollbacks.

but I agree, this is an approximation, and like I said, I don't dislike the topo sort, I'm just still not a fan of the wording in the docs and docstrings. it's going to promote people thinking the sort has to be this way just due to the dep chain itself (i.e. that pip can't install A before B if A install-requires B)

@rbtcollins
Copy link
Author

But pip shouldn't install A before B if A install-requires B, because that isn't a valid install of A. What exact prose would you like in the docstring - I don't want to try and guess at what will be acceptable to you.

dstufft added a commit that referenced this pull request Apr 1, 2015
Issue #2478: Topological installation order
@dstufft dstufft merged commit c85f1d6 into pypa:develop Apr 1, 2015
@qwcode
Copy link
Contributor

qwcode commented Apr 1, 2015

as noted in IRC, I'll log another PR with ideas on changing the docs and comments

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants