Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get deps from pypi during build #3310

Merged
merged 22 commits into from
Jan 4, 2023
Merged

Conversation

joemarshall
Copy link
Contributor

@joemarshall joemarshall commented Dec 1, 2022

Description

Further extension to the work on pyodide build. This adds two new arguments:

--build-dependencies

Get dependency list from wheel and build (or download) wheels for all dependencies alongside the main wheel.

--skip-dependency NAME
Skip a dependency - NAME can be either a package name, or the path to a json package list file in either repodata.json format (ie. Built in pyodide packages) or jupyterlite all.json format (i.e. jupyterlite wheel folder).

It uses resolvelib to do dependency resolution because it can handle backtracking etc. If you call pyodide build to build something then chuck all the wheel files into jupyterlite pypi directory it should now be able to load without hitting pypi (as piplite uses the version provided locally by default).

Needs tests but I thought it worth chucking up as soon as it was working for people to see

Checklists

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @joemarshall !

I'm wondering if we actually want to manage the dependency resolution ourselves. The other alternative could be to use something like pip-tools to generate a dependency tree and output the requirements.txt with pinned versions that can then be built.

I guess here we could do a more accurate resolution for the emscripten_wasm32 platform, but I'm a bit concerned about the feature creap. In particular,
a) it might be better to list the output of packages to build, so the user can double-check them before actually building them. So I wonder even if we implement this, if wouldn't be better to write a requirements.txt files with some new CLI command and then run build on that input.
b) there are a lot of ways in which dependency resolution could produce results different to what the user wanted. So we need to consider whether we want to deal with such issues in the future, or if we consider this out of scope, and rather rely on some existing tool to do some of this.

Overall I agree this is useful functionality, and I'm not opposed to this PR, but I just want to be sure we also consider existing alternatives (if any) that would allow solving part of this use case.



@contextmanager
def stream_redirected(to=os.devnull, stream=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could use contextlib.redirect_stderr and redirect_stdout to avoid re-implementing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextlib doesn't redirect subprocess output - that's why this one is needed. Otherwise one would need to delve into anywhere subprocess is called (i.e. deep inside build) to redirect that.

Copy link
Member

@ryanking13 ryanking13 Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write a comment about this for the people who read this code in the future?

@joemarshall
Copy link
Contributor Author

I was a bit torn about doing it like this with resolvelib (which is basically the resolution logic from pip made into a library) vs pip-tools (which uses more of pip internals, but is basically a similar thing).

I think the pain with pip-tools is that there are a bunch of use cases that it doesn't easily support which are really super-useful for emscripten. In particular:

  1. It doesn't support ignoring packages. Unless pyodide starts entirely using pypi packages and not using the pyodide package repository, this is a real pain; especially if the ignored packages have complex dependency trees, unwrapping them from the piptools output is a right pain. I also have the particular use case that there are various packages that use stuff to do cool multiprocessing stuff or processor specific stuff, which have dependencies on compiled packages that will just never compile on emscripten.

  2. It doesn't really support cross-compiling - this isn't too bad right now as most packages don't deliberately support emscripten, but with things like jupyterlite, I think increasingly there will be packages that know about emscripten. I've got a couple of packages that drop deps on emscripten for example where they are used to run servers, or to do graphics.

  3. It doesn't support building emscripten wheels for sdist packages. This is important because
    a) a surprising number of packages are sdist only anyway, and many packages are also sdist only on emscripten
    b) you need to build the wheel before you can tell what the dependencies are, so you can't reasonably separate building wheels from resolving dependencies.

It's easy enough to make this logic output pinned requirements.txt files if required (and building from them is pretty trivial anyway given we support pypi builds already), but I do think having the core bit that calls resolvelib and makes wheels at the same time is important to support correctly resolving for emscripten.

I have to moan about python packaging again btw, I get that in theory different builds could have different build-time generated dependencies, and that there are all sorts of build tools, but the fact that there is no standard metadata file included in an sdist which just specifies dependencies for each platform makes me sad. Having to compile wheels in order to find dependencies is a bit awful...

@joemarshall
Copy link
Contributor Author

it might be better to list the output of packages to build, so the user can double-check them before actually building them. So I wonder even if we implement this, if wouldn't be better to write a requirements.txt files with some new CLI command and then run build on that input.

The problem I ran into is things that are sdist only, or don't have pyodide specific wheels. There's no reliable way to determine requirements for an sdist other than to compile it. So then my script to generate the requirements.txt had to build all the non-pure python packages anyway, before it could output a set of resolutions, and having it done in one go saves doubling time spent compiling.

Basically I think anything that makes the requirements.txt files will
a) have to be integrated with and hence call into the pyodide build system to make wheels during the resolution process.
b) Need to use some kind of resolver library, either libresolve or pip-tools

It could in theory be a separate pyodide command that does this, although that would lose the benefit of being able to build everything for a project from scratch whilst only building wheels once. Dunno how important that is?

@joemarshall
Copy link
Contributor Author

@rth Any thoughts on the above:

I had a bit of a play with pip-tools again, but until there's a full cross-build environment for pyodide including support for using pyodide built-in packages in pip, it just isn't trivial to do this using pip-tools. I tried with pyodide pyenv but pip didn't really work for me there.

You end up hacking an isolated virtual environment and then hacking sys to change the values of sys_platform etc., then telling it a whole load of packages are unsafe, and the unsafe list has to include any dependencies of any packages that you already have, as it still traverses dependencies of unsafe packages. It freaks me out slightly as you end up relying on the internal behaviour of pip-tools, which in turn is tied to the internal behaviour of pip, both of which don't really support running as libraries as opposed to scripts.

Personally I think that one script in pyodide build that does the downloading and building of dependencies (and of requirements.txt files) using libraries that are meant to be used for dependency resolution is probably less fragile and less work to maintain than a script to hack a virtual environment so that pip-tools works, which will end up needing maintenance as pip-tools upgrades things or changes behaviour.

It would be possible with a load of work on cross-env and pyodide to get things to a state where one could just build entirely using pip, but I think that's some way off? I really want reliable package building for complex dependency trees to be there soon, as it just makes everything so much simpler when porting anything complicated that uses multiple libraries across.

@hoodmane
Copy link
Member

It would be possible with a load of work on cross-env and pyodide to get things to a state where one could just build entirely using pip, but I think that's some way off? I really want reliable package building for complex dependency trees to be there soon, as it just makes everything so much simpler when porting anything complicated that uses multiple libraries across.

I looked into this a bit and I agree it looks quite complicated and fragile. If there is an alternative to writing some complicated hack to make pip's internals understand what we need then it is preferable. It would be cool to get editable installs working correctly though. Currently with my venvs we can do editable installs of pure Python packages, but not of binary packages.

pyodide-build/pyodide_build/out_of_tree/pypi.py Outdated Show resolved Hide resolved
pyodide-build/pyodide_build/cli/build.py Outdated Show resolved Hide resolved
pyodide-build/pyodide_build/cli/build.py Outdated Show resolved Hide resolved
pyodide-build/pyodide_build/cli/build.py Outdated Show resolved Hide resolved
pyodide-build/pyodide_build/cli/build.py Show resolved Hide resolved
pyodide-build/pyodide_build/out_of_tree/pypi.py Outdated Show resolved Hide resolved
pyodide-build/pyodide_build/out_of_tree/pypi.py Outdated Show resolved Hide resolved


if TYPE_CHECKING:
APBase = AbstractProvider[Requirement, Candidate, str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What goes wrong if you write AbstractProvider[Requirement, Candidate, str] when not TYPE_CHECKING?

pyodide-build/pyodide_build/out_of_tree/pypi.py Outdated Show resolved Hide resolved
@joemarshall
Copy link
Contributor Author

@hoodmane I got round to doing the review changes for this. I also added thorough tests which serve a very basic fake pypi to test things like unresolvable requirements and complicated extra handling.

I think at the moment it feels like it is reliable enough to merge. I have a couple of todos that would be worth putting in which is to read and resolve requirements.txt files, and to write out a flat requirements.txt but ideally I'd like to put those in a separate pr?

@hoodmane
Copy link
Member

Sounds good to me. This pr is already quite large any additional features would be best left to follow ups. I'll try to look this over one more time and merge it in a bit.

@hoodmane
Copy link
Member

Or you could merge it if you like.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @joemarshall. I have minor comments about some details. I wasn't following the full conversation in this PR, so I'll let others approve.

if not package_path.is_dir():
# a pure-python wheel has been downloaded - just copy to dist folder
shutil.copy(str(package_path), str(curdir / "dist"))
print(f"Successfully fetched: {package_path.name}")
return
return curdir / "dist" / package_path.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this into a variable and reuse in both shutil.copy and return value?

pyodide-build/pyodide_build/out_of_tree/pypi.py Outdated Show resolved Hide resolved


@contextmanager
def stream_redirected(to=os.devnull, stream=None):
Copy link
Member

@ryanking13 ryanking13 Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write a comment about this for the people who read this code in the future?

)
def test_host_in_node_test(runtime, fn, tmp_path):
if runtime.startswith("node"):
fn(tmp_path)
Copy link
Member

@ryanking13 ryanking13 Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are not actually running. I am trying to fix broken CLI tests in #3390. Maybe you will need to fix tests in a same way in #3390 (probably after it is merged).

@joemarshall
Copy link
Contributor Author

Hi @hoodmane , I've done the review changes above now, if you could merge that would be great, I don't have merge access to this repo.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your work @joemarshall. Please update the changelog, and some remaining test issues need to be fixed.

pyodide-build/pyodide_build/tests/test_pypi.py Outdated Show resolved Hide resolved
@joemarshall
Copy link
Contributor Author

@ryanking13 Thanks, done that now.

@joemarshall
Copy link
Contributor Author

Damn, something broken in node tests, I'll sort that tomorrow

@joemarshall
Copy link
Contributor Author

@ryanking13 tests run in ci correctly now

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ryanking13 ryanking13 merged commit 76581a7 into pyodide:main Jan 4, 2023
hoodmane added a commit to hoodmane/pyodide that referenced this pull request Apr 6, 2023
This switches to passing the source and build directories as arguments.
It adds an output-directory argument to pyodide build allowing us to
indicate where the output wheel should go independent of the build
directory. I also did some cleanup of the logic added in pyodide#3310
hoodmane added a commit that referenced this pull request Apr 12, 2023
This switches to passing the source and build directories as arguments.
It adds an output-directory argument to pyodide build allowing us to
indicate where the output wheel should go independent of the build
directory. I also did some cleanup of the logic added in #3310
hoodmane added a commit to hoodmane/pyodide that referenced this pull request Apr 12, 2023
…ide#3746)

This switches to passing the source and build directories as arguments.
It adds an output-directory argument to pyodide build allowing us to
indicate where the output wheel should go independent of the build
directory. I also did some cleanup of the logic added in pyodide#3310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants