-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEX_PATH transitivity #1423
Comments
Both 1 and 2 are correct. Pex could be modified to walk a graph of pex_path metadata in PEXEnvironment, but - IIUC that feature would just save you a line or two of code implenting a hack that will be replaced anyhow if the hack bears fruit. Is that about right? |
I actually may not understand what's going on here. I'll take a look tomorrow. It would be useful if you could sketch this all out 1st with just Pex command lines, but I can do that too to create a repro case that doesn't have Pants code entangled / details missing. |
Sure, sorry. Here is a small repro (which, confusingly, uses
|
Thanks for the repro case. I don't repro the $ python -mvenv pex.venv
$ pex.venv/bin/pip install pex==2.1.45
$ for i in certifi charset_normalizer idna urllib3; do pex.venv/bin/pex "${i}" -o "${i}.pex"; done
$ pex.venv/bin/pex requests --no-transitive --pex-path=certifi.pex:charset_normalizer.pex:idna.pex:urllib3.pex -o requests.pex
$ ./requests.pex
Traceback (most recent call last):
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 482, in execute
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 139, in activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 126, in _activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 428, in activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 784, in _activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 608, in resolve
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 692, in resolve_dists
pex.environment.ResolveError: Failed to resolve requirements from PEX environment @ /tmp/issues-1423/requests.pex.
Needed manylinux_2_33_x86_64-cp-39-cp39 compatible dependencies for:
1: urllib3<1.27,>=1.21.1
Required by:
requests 2.26.0
But this pex had no 'urllib3' distributions.
2: certifi>=2017.4.17
Required by:
requests 2.26.0
But this pex had no 'certifi' distributions.
3: charset-normalizer~=2.0.0; python_version >= "3"
Required by:
requests 2.26.0
But this pex had no 'charset-normalizer' distributions.
4: idna<4,>=2.5; python_version >= "3"
Required by:
requests 2.26.0
But this pex had no 'idna' distributions.
$ PEX_PATH=certifi.pex:charset_normalizer.pex:idna.pex:urllib3.pex ./requests.pex -c 'import certifi, idna, urllib3, charset_normalizer, requests'
Traceback (most recent call last):
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 482, in execute
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 139, in activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/pex.py", line 126, in _activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 428, in activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 784, in _activate
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 608, in resolve
File "/tmp/issues-1423/requests.pex/.bootstrap/pex/environment.py", line 692, in resolve_dists
pex.environment.ResolveError: Failed to resolve requirements from PEX environment @ /tmp/issues-1423/requests.pex.
Needed manylinux_2_33_x86_64-cp-39-cp39 compatible dependencies for:
1: urllib3<1.27,>=1.21.1
Required by:
requests 2.26.0
But this pex had no 'urllib3' distributions.
2: certifi>=2017.4.17
Required by:
requests 2.26.0
But this pex had no 'certifi' distributions.
3: charset-normalizer~=2.0.0; python_version >= "3"
Required by:
requests 2.26.0
But this pex had no 'charset-normalizer' distributions.
4: idna<4,>=2.5; python_version >= "3"
Required by:
requests 2.26.0
But this pex had no 'idna' distributions. Looking at the code, this makes sense. The So, there should probably be expanded CLI help here:
That hints at all this but by no means spells it out clearly. As to what you're trying to do here - I think supporting a |
Shoot... you're right. Looks like I accidentally ran that case without
Got it: thanks. Will respond there. |
Closing since this was answered and superseded by #1424 which has since shipped and been used to good effect IIUC in Pants. |
FWIW: recursive composition of PEXes (as alluded to here and described in the second half of this comment) came up again in the context of pantsbuild/pants#10864. "subset per (test) root" has been workable with optimizations like #1534, but it seems clear that also subsetting for inner nodes (i.e. per file, or even per tens of files) won't scale. I'm not sure that it will be a priority any time soon, but I think that it would be worth re-opening this issue at some point such that we can externally/recursively compose PEXes via the PEX_PATH. |
Why are you using Pex at all at that point? I feel like I'm missing something. The remaining slowness in subsetting is in logic you'll need to implement outside of Pex unless you know some magic I don't. You'll need to be able to evaluate the booleans |
Maybe simpler to think about. What advantage is it you think you have and can leverage in Pants that Pex doesn't have? |
I think that the primary difference is that subsetting requires creating and then tearing down a sandbox for the subsetting. We should probably apply
AFAIK, spread PEXes are composed of "zipped installed wheels", which are sortof like eggs, but not exactly. We could re-invent that format externally to PEX, but. If the relevant subset of the "zipped installed wheels" were placed directly on the python path ( |
You've failed to address my point about the remaining overhead in subsetting. That's the same overhead Pants would need to incur using PEX_PATH, namely calculating the subset of PEXes to compose. In my profiling case that calculation took the remaining 2s of what was initially a 4s subset operation. I killed 2s of that by not re-hashing distributions.
Ok, so Pants has a slowness problem there but it sounds like it also has an undeployed workaround for that bit.
The change in Pex to support creating a venv by using symlinks solves any perf issue in creating venvs for large dists IIUC. |
Going along with your optimal cold case tack: in order to avoid the Xms it takes to build a venv (I'll circle back and fill in this number), you need to be using a scrubbed python |
Finally, re the cold case, we're still talking ineffective since the 2s subset calculation time is still there and the venv creation time is less than that (closer to 1s max IIRC). So if we really want to support fast cold subsetting, the 0th order problem is the subset calculation speed. |
Besides switching to a new language to implement the PEX CLI in, the only obvious thing that would work in a fairly obvious way to speed up subset calculation would be to avoid the expensive parts, which broke down into parsing requirements in 2 places. The easiest way to avoid that is to do it just once. That means a Pex daemon. Pants doesn't support those generically yet though; so either it would need to grow persistent worker support paired with a Pex daemon mode, or Pex would need to have a transparent daemon mode per-PEX_ROOT that Pants could use without doing anything except passing a flag. |
A simpler way to handle subsetting perf improvement is probably to introduce a new [
{
"interpreter": "...",
"requirements": {
"foo>1.2": "fecdbc8b63360ced5e6d0ab2819586145057a47c/foo-1.3--py2.py3-none-any.whl"
},
"dependencies": {
"fecdbc8b63360ced5e6d0ab2819586145057a47c/foo-1.3--py2.py3-none-any.whl": [
"fecaabff0e1528da693cb73c3e2b4e50af74c045/isort-5.6.4-py3-none-any.whl",
"fe3c088f452c335d15e29a3bd29cb550feeba62e/google_auth-1.35.0-py2.py3-none-any.whl"
],
"fecaabff0e1528da693cb73c3e2b4e50af74c045/isort-5.6.4-py3-none-any.whl": [],
"fe3c088f452c335d15e29a3bd29cb550feeba62e/google_auth-1.35.0-py2.py3-none-any.whl": [
"fecaabff0e1528da693cb73c3e2b4e50af74c045/isort-5.6.4-py3-none-any.whl"
]
}
}
] That data could be loaded and parsed quickly and used to calculate a subset ~as fast as possible for any of the local interpreters used to build the PEX. If --platforms were included, it would not work for those - no entry would be output. Pants use-case though is exactly only local interpreters. As such, I think this additional seed mode makes sense both for Pex standalone and it would satisfy Pants need to cut down on subsetting time. That time is now amortized up-front in 1 ~2s step (sticking to the 4s subset test I've been referring to) to output this file. The loading of the file and forming of a subset from it should be roughly O(10ms) not having written any code yet. This has a distinct advantage over PEX_PATH transitivity since PEX_PATH requires running all the PEX resolve code / environment marker evaluation, etc on each boot. The approach above allows all that to be skipped is a venv is assembled from the subset, or a PYTHONPATH is constructed with all the caveats mentioned on that not working for bad ns-pkg dists out in the wild and dists like Pylint. |
@stuhood this is all a very complicated set of perf / compatibility trade offs. I think seeding is the last key to making subsets, whether materialized in venvs or PYTHONPATH or however, fast as possible in Python. That said, this could all use an issue to explore the goal instead of the mechanism - namely fast-as-possible subsetting performance for local use PEXes. If that makes sense to you, I'll break one out and tame this mess there with some pared down perf # examples of todays slow bits and the ideas for solving these wholistically - i.e.: how to get a subset calculated fast, how to assemble a subset fast robustly, how to assemble a subset fast less robustly, and how to scrub sys.path performantly. All the bits needed to actually create and execute a subset with the standard PEX guarantees of hermeticity. |
Yea. The goal in this case is probably "being able to dynamically create a venv that is a subset of a lockfile as fast as possible in order to support fine-grained invokes of tools (such as recursive compilation with
How different is this from the proposed lockfile format? But yea, this seemingly amounts to exporting the fully resolved graph.
It does make sense, thanks. I'm doing some triage of Pants performance issues this week in order to figure out where we should invest here: I don't think the relative priority of this issue as high right now as 1) finishing lockfile support, 2) finishing enabling the Pants rust client by default, but maybe a design sketch in another ticket would be convincing enough to prioritize. |
Relatedly: how likely is it that producing the " That would 1) speed up changes to the lockfile, and 2) implicitly parallelize the work of actually building wheels into consumers of the lockfile. |
Alot. That format includes unevaluated environment markers. The whole point here is those need to be evaluated ahead of time once to save time later. That can only be done for interpreters-in-hand. In the --seed scenario, that is exactly what we happen to have. The lockfile conflation is definitely off. For lockfile generation there is already no building of wheels or installing of them. The lockfile generation is implemented using only
So, I'm not sure what angle you've got here. Are you trying to milk parallelism for those cases where the Pex --jobs are configured lower than the Pants parallelism? Pex already builds and installs in parallel using --jobs; so its true this is eager and not lazy, but afaict its never wasted work. Pants will always need the built / installed wheels. Answering the question though, right now wheel metadata is what is used to calculate the active dependency graph for the current interpreter given a set of requirement roots. I can't see this sanely budging since that would mean Pex would need to learn how to extract metadata from sdists; ie running setup.py egg_info old school or PEP-517 new-school (which requires falling back to just plain building the wheel since metadata-only build methods are optional in PEP-517). So, wheels required, installing them is not. That said, installing the wheels is what is required to actually build any sort of PEX today. To take that step out (again so why?... so Pants could eek out more parallelism for the --jobs < Pants parallelism case?) would require the Pex runtime learning how to install wheels. That either means adding the full bulk of Pip to the runtime (it's only used in the build time tool today and is 2.3MB compressed vs the rest of the runtime which is ~400KB compressed) or else writing a bunch of code. |
You're right if you're running all of the tests in a repository (the CI case). But if we're going to be subsetting a whole-repo resolve, the effect right now is that all wheels for the entire resolve need to be built in order to run a single test, even if that single test only needs a small subset of the resolve.
I don't think that I mean that wheel building/installing should be deferred until runtime, per-se... I think that I had been thinking that in our current world where we 1) fully execute/build-wheels for a large resolve, 2) subset that resolve into a PEX, 3) build a venv from the PEX... then if we moved the wheel building from step 1 to step 2, then we'd introduce good laziness for the "I'm running a single test" case. But if step 2 were to go away, and we were instead directly building a venv from a lockfile/seed-resolve, then lazily building the wheels would mean essentially ... resolving the subset directly into a venv from the lockfile/seed-resolve, I suppose. You're right that that is odd... it would essentially be asking PEX the tool not to actually bother to create a PEX file, and to just create a venv. It seems very close to the "Installer" role of PEP-665, but with the (added?) ability to install only a subset of the lockfile. |
When building a graph of PEXes, each of which contains distributions for a single requirement (*mostly: disregard the
pluggy_zipp_importlib-metadata_pytest_attrs
cluster), it appears that the embedded pex_path in the PEX-INFO is not consumed. I'm wondering whether:As an example: a single requirement PEX containing
humbug
, with a pex_path containing its requirement PEXes might have PEX-INFO that looks like:But consuming it will fail to find the
requests
distribution embedded in the__reqs/requests.pex
PEX:Below either
PexBuilder.set_script
orbin.pex.seed_cache
:The text was updated successfully, but these errors were encountered: