Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using virtualenv for managing upstream Python dependencies #8392

Open
EricCousineau-TRI opened this issue Mar 16, 2018 · 18 comments
Open
Assignees

Comments

@EricCousineau-TRI
Copy link
Contributor

EricCousineau-TRI commented Mar 16, 2018

Per @rdeits's comment in #8352:

Also, problems with homebrew and system python conflicts could be completely eliminated by using python's virtual environments (which are even built-in on python3.5 and above). I can't recommend that highly enough.

My 2 cents:

virtualenv does sound like a good option.

I'm not sure if I'm a fan of our present re-inventing of pip in Bazel; understandably, it's for reproducibility and keeping most of the dependencies in Bazel, but seems like it may be brittle when the wheels are more complex or have more comprehensive dependencies (e.g. numpy). We can always teach Bazel to scan and incorporate the upstream Python dependencies for hermetic-ness determinism with remote caching.

Additionally, if the custom-dtype solution for #8116 ends up working, it would greatly smooth out having an optionally patched version of numpy for memory management.

That being said, virtualenv is additional tweaking on the environment that does have a bit of a funny (and semi-invasive) smell to it. However, if it reduces headaches (e.g. if we can figure out how to teach nlopt from Homebrew to not screw around with / conflict with pip packgaes), I'm all for it.

@jwnimmer-tri @jamiesnape Can I ask what your opinions are?

EDIT: Working with virtualenv will be a workflow similar to what we may want with ROS workspaces.

EDIT 2: For now marking priority as medium. Can update if needed.

\cc @calderpg-tri

@jamiesnape
Copy link
Contributor

I am pretty much on record with hating reinventing apt and pip with Bazel, so it would make me happy.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Mar 16, 2018

(For myself) A tiny breadcrumb for patching local PIP packages:
https://stackoverflow.com/questions/5570666/patching-python-packages-installed-as-dependencies-with-pip

UPDATE: I believe I understand Robin's comments more now about virtualenv; it would make supporting both Python2 and Python3 much (read: a crap ton) easier for managing the switches. Rather than throwing a crap ton of Py3 duplicates in our Bazel code, we just change the environment (if needed).

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Mar 16, 2018

On the subject of needing numpy fixes (not necessarily virtualenv or python2/3 stuff), I am not sure that I'm following, but my main concern is that in the same way libdrake.so needs to play well with the rest of the system libraries at runtime, because Drake is a library not a program -- pydrake needs to play well with the rest of the python environment at runtime. If we need to use non-default versions of things like numpy, how to we know that other code's use of numpy still works? I think we should go out of our way to be compatible with stable versions of common libraries, and not to require bleeding-edge versions.

@jamiesnape
Copy link
Contributor

(For myself) A tiny breadcrumb for patching local PIP packages:
https://stackoverflow.com/questions/5570666/patching-python-packages-installed-as-dependencies-with-pip

I am certainly not in favor of patching packages.

@EricCousineau-TRI
Copy link
Contributor Author

Posted responses about package patching in #8116.

@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Apr 11, 2018

I had seen some discussions on Slack about virtualenv, and ran into a blocking issue for #8452 that does require a two-line patch to numpy (still weighing cost/benefit, and I'm definitely not excited about the need for patching).

I'm not sure if we've considered it before, but there do seem to be some mechanisms to enable having a checked-in (or, I would assume, a generated) Python runtime, py_runtime and bazel build --python_top={target}:
https://github.com/erain/bazel-python-example
My assumption is that if we have a self-contained virtualenv (or use --system-site-packages), then we can create a repository to generate this environment, and point to the generated Python binary to use for the project. (Or permit an external virtualenv, and just use whatever binary and dependencies if people want to muck around with that.)

This could also pave the way towards containing support Python2+3 per #8352, most likely via a configuration switch.

@jwnimmer-tri
Copy link
Collaborator

I am not generally as worried about making our development and test environment work. I am most worried about how the installed pydrake interacts with the rest of the python universe. I think the clearest way to look at this question (and the wider python2/3 and forked-dependencies questions) is to first explain what the installed experience looks like, and then from there work out what the development environment looks like.

@jamiesnape
Copy link
Contributor

jamiesnape commented Apr 12, 2018

Given that I am strongly against patching packages, I am not sure a virtualenv solves anything in the grand scheme of third-party Drake usage.

(My personal opinion is that whether to use a virtualenv should be the choice of the end-user and not something required.)

@jwnimmer-tri jwnimmer-tri added the component: build system Bazel, CMake, dependencies, memory checkers, linters label Apr 28, 2020
@jwnimmer-tri jwnimmer-tri added component: distribution Nightly binaries, monthly releases, docker, installation and removed component: build system Bazel, CMake, dependencies, memory checkers, linters labels May 28, 2020
@EricCousineau-TRI
Copy link
Contributor Author

EricCousineau-TRI commented Sep 17, 2020

Ran into another case of desiring this: Better ipywidgets support in #14082

FYI @RussTedrake

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Jan 18, 2023

A motivating use case is on the rise -- Homebrew seems incapable of making a Python minor version transition without breaking all of non-first-party Python libraries as they go (e.g., recently scipy only works with 3.10, but the default python3 still points to 3.9).

It's probable that Drake on macOS should only use the Python interpreter from Homebrew, and for other Python ecosystem dependencies we use pip (with a lockfile).

@jwnimmer-tri
Copy link
Collaborator

Update: I have somewhat of a prototype of this now in TRI Anzu. After we gain some experience with that over the next week or two, I'll dump a snapshot of the code into this ticket, along with some more specific goals, and we can take it from there.

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Aug 1, 2024

For kitware:

  • The first milestone here should be to use a venv on macOS, for Drake source builds.
    • We want to satisfy two "user stories" here:
      • A drake developer on macOS can build/test Drake without pip-installing anything system-wide, i.e., we don't have the --break-system-packages thing during install_prereqs.
      • A drake user on macOS can CMake install Drake without pip-installing anything system-wide as prerequisite to building.
  • We can defer any Ubuntu changes until a later milestone.
  • The venv will (eventually) need to be part of the provisioned images.

Here's the sample Anzu code to get started: anzu-snippets.zip.

Notes:

  • Probably requirements.{in,txt} should live in the setup/... tree somewhere, not at the project root.
  • Also needs changes to tools/workspace/default.bzl to call the new venv_repository rule.
  • Use the venv as deps = ["@venv"] on a py_library or py_binary target.

This should at least be able to supplant the source_distribution requirements management.

For binary_distribution/requirements.txt, bazel stuff doesn't help at all. We'll need to do something different there.

@mwoehlke-kitware
Copy link
Contributor

I have no idea what an "anzu" is. I'm guessing all instances of that should be replaced with "drake"? If I do that, I can run the scripts, but it isn't clear how this is supposed to be actually consumed by other Drake bits that need the venv.

Also, if this is intended to replace what's currently pip installed, does that mean we'd drop the --[without-]test-only split?

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Aug 15, 2024

I have no idea what an "anzu" is

It's the codename for TRI's internal git repository.

I'm guessing all instances of that should be replaced with "drake"?

Yes.

If I do that, I can run the scripts, but it isn't clear how this is supposed to be actually consumed by other Drake bits that need the venv.

That's this part:

Also needs changes to tools/workspace/default.bzl to call the new venv_repository rule.
Use the venv as deps = ["@venv"] on a py_library or py_binary target.

The way to check is to find a test that needs something from pip in order to succeed, and then iterate to get that working.

Does that mean we'd drop the --[without-]test-only split?

Probably not. Users who are installing Drake from source (e.g. via CMake) should only by default need to pay the download cost of the non-testonly dependencies. The --with-test-only should (now) be opt-in, for use only by the Drake Developers (and Drake CI).

@jwnimmer-tri
Copy link
Collaborator

One update from f2f: the unique challenge of "default" vs "testonly" (versus what Anzu does) is to have sync know which choice should be used. The way we can do that is have install_prereqs.sh (really, probably venv/setup?) symlink the desired requirements.txt into or nearby the venv. Then sync can refer to the symlinked file, instead of hard-coding a specific requirements.txt.

@RobotLocomotion RobotLocomotion deleted a comment from jwnimmer-tri Aug 22, 2024
@mwoehlke-kitware
Copy link
Contributor

Okay, this one is puzzling:

FAIL: //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint (see .../execroot/drake/bazel-out/darwin_arm64-opt/testlogs/bindings/pydrake/common/py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint/test.log)
INFO: From Testing //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint:
==================== Test output for //bindings/pydrake/common:py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint:
Traceback (most recent call last):
  File ".../execroot/drake/bazel-out/darwin_arm64-opt/bin/bindings/pydrake/common/py/_test/serialize_test_bar.cpython-312-darwin.so_private_headers_cc_impl_cpplint.runfiles/drake/../styleguide/cpplint/cpplint.py", line 56, in <module>
    import six
ModuleNotFoundError: No module named 'six'

It seems like the correct patch (ignore the likely style error) is:

diff --git a/tools/workspace/styleguide/package.BUILD.bazel b/tools/workspace/styleguide/package.BUILD.bazel
index 03b5770120..8001a7291d 100644
--- a/tools/workspace/styleguide/package.BUILD.bazel
+++ b/tools/workspace/styleguide/package.BUILD.bazel
@@ -22,6 +22,7 @@ py_binary(
     python_version = "PY3",
     srcs_version = "PY3",
     visibility = [],
+    deps = ["@venv"],
 )
 
 alias(

...but I'm still getting the above error.

@mwoehlke-kitware
Copy link
Contributor

Another issue... since we're only doing macOS for now, just adding deps=["@venv"] breaks Linux. I suppose there's a way to say "this is only a dep on macOS", but that's a bunch of extra work that we expect to eventually remove. Is there instead a way to make @venv a stub (for now) on not-macOS?

@jwnimmer-tri
Copy link
Collaborator

It would be simpler for me if you could post all questions using Reviewable, on the pull request, so we can discuss in situ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants