-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing "activation" time from PEX #1115
Comments
How confident are you of your interpretation of the speedscope? FWICT > 50% of "resolving" is importing which was the initial focus of #930. Importing we cannot get around multiplatform PEX or not except by eliminating expensive imports from the bootstrap. |
Somewhat confident. Were you able to get it to load? The I used
|
@jsirois : That's not the callstack: i.e. In this screenshot (a flamegraph), the left 25% of the script is under Also, speedscope overrides the "Find-in-Page" implementation in a really useful way, and so "Find-in-Page" for |
Gotcha. So there are 11 instances of _activate (/home/vagrant/slow-pytest/process-execution3RBeWT/pytest.pex/.bootstrap/pex/environment.py:439) in that section. That function is called once per PEXEnvironment activation and that implies your example has 11 PEXes conjoined via PEX_PATH. Is that true? ... Ah, nm - the line changes. |
Ok, so in this example 27% of the total runtime is spent resolving amongst 3 PEX files conjoined via PEX_PATH. |
The resolve is delegated to pkg_resources.{Environment,WorkingSet} which do more than we need. We just need: And that looks like it takes 5% of the runtime only. So this means ditching PEXEnvironment inheritance from Environment / use of WorkingSet in favor of just ripping through all distributions in the PEX and evaluating them with can_add directly before adding them to sys.path. |
Another tack would be to implement running PEXes as normal applications as proposed in #962. In that approach there is a 1 time venv setup cost (unmeasured cost at this point) at which point every execution going forward re-execs into the venv (PEX_ROOT/venvs/...) in the same manner as I have venv creation implemented as a runtime tool that can be manually run (build PEX file using new |
Interesting! A fairly important benchmark is the one in the description: the two pexes are 1) pytest, 2) user requirements, with loose sources sitting alongside. The loose sources are edited the most, and the two pexes should be relatively more stable, and thus possible to reuse. I think that it is also entirely possibly for that to be one pex, because I don't think we gain much benefit from it being two: just from the sources being loose. If we assume that the goal of a pex is always to run it multiple multiple times, then "creating pexes is fast" is a lot less important than "running pexes is fast". So particularly if more of the preparation of the venv can be frontloaded to construction time (which was the thrust of my "move the calculation of what we will be activating from pex runtime to pex construction time" comments in the description), then that sounds interesting. |
N.B.: Even if the venv construction was only done on demand at runtime, Pants could |
Yea, true. But it would be great if even the "first run" of a PEX was faster. Because the benefit of the extracted/reusable venv can only be realized if you have support for mutable caches (unless the venv itself is relocatable?) |
The venv is relocatable as a whole and its output path is also controllable. My current experiment that gets venv working has this CLI syntax: |
Add a new `--include-tools` option to include any pex.tools in generated PEX files. These tools are activated by running PEX files with PEX_TOOLS=1. The `Info` tool seeds the tool set and simply dumps the effective PEX-INFO for the given PEX. Work towards pex-tool#962 and pex-tool#1115
Rewinding a bit to discuss where this is headed: the goal right now is In particular, one question I have is: in which cases would someone want to:
Cases two and three are particularly interesting. I can definitely see this being useful from a normalization perspective (Pants uses PEX internally, but can export a venv), but are there benefits to two over three for the pytest usecase? |
@kwlzn opined on item 1 here: #962 (comment)
Item 2 would be immediately useful to Pants clearly. Item 3 is not a thing for Pex since Pex fundamentally supports two fetaures:
There is already a tool for item 3 and that's You left out item 4, which Is build-a-pex-that-self-extracts-to-a-venv + run-a-pex. That will build upon item 2 and close #962. |
Would you recommend that Pants do that for pytest? Or is item 2 a better fit for that usecase?
I was assuming that that was sufficiently fast, it would be the default implementation of item 1, so I didn't include it as a separate item. Should it be? |
I'm not sure yet, but my reccomendation is not really relevant. The only relevant thing is the performance comparison which we'll have shortly.
Again not sure yet. Need timings which we'll have shortly. The only timings I have so far are for case 2 (build-a-pex + extract-the-pex-to-a-venv + run the venv). That case is == to raw venv speed +/- 1ms (noise) for runs 2+. For run 1, the summed time of build-a-pex + extract-the-pex-to-a-venv + run the venv, its 70 to 140 ms slower in my current test cases on my machine. Bear in mind - this tools / venv approach is needed without regard for this issue - see #962, but there are other problems caused by PEX's custom venv solution that the tools / venv will fix which will be a win for some users who can't use PEX at all to bundle their app today. IOW Pants perf concerns have no trumping influence on the need for this approach, only for prioritization of shared resources. |
OK, for a 47MB pex with 114 distributions just activating the PEX timings: Old style
New style:
|
Really, really awesome stuff. |
This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix pex-tool#1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards pex-tool#962 and pex-tool#1115.
This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix #1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards #962 and #1115.
Add a `venv` tool to create a virtual environment from a PEX file. The virtual environment is seeded with just the PEX user code and distributions applicable to the selected interpreter for the local machine. The virtual environment does not have Pip installed by default although that can be requested with `--pip`. The virtual environment comes with a `__main__.py` at the root of the venv to emulate a loose pex that can be run with `python venv.dir` just like a loose pex. This entry point supports all the behavior of the original PEX file not related to interpreter selection, namely support for PEX_SCRIPT, PEX_MODULE, PEX_INTERPRETER and PEX_EXTRA_SYS_PATH. A sibling `pex` script is linked to `__main__.py` to provide the maximum performance entrypoint that always avoids interpreter re-execing and thus yields equivalent performance to a pure virtual environment. Work towards #962 and #1115.
The new --venv execution mode builds a PEX file that includes pex.tools and extracts itself into a venv under PEX_ROOT upon 1st execution or any execution that might select a diffrent interpreter than the default. In order to speed up the local build and execute case, --seed mode is added to seed the PEX_ROOT caches that will be used at runtime. This is important for --venv mode since venv seeding depends on the selected interpreter and one is already selected during the PEX file build process. Fixes #962 Fixes #1097 Fixes #1115
There was still overhead left in the PEX zip python bootstrap code running just enough to check if its venv was already present in the PEX_ROOT before re- |
As shown in #930, in some cases the activation of a PEX takes 50% of the total runtime, primarily on the task of "resolving" the dependencies to use within that run. This resolution is necessary for cross-platform and portable PEX files which might contain multiple copies of certain wheels.
Two potential ways to eliminate this time might be:
PEX-INFO
, perhaps) for use verbatim at runtime. This would involve doing a fuzzy dictionary lookup of the target platforms for the PEX to determine which set of contained wheels to use.In both cases, one challenge would be giving good error messages if a PEX had been moved to an incompatible platform. If the PEX explicitly embedded the platform(s) it was built to target, it could quickly fail if was invoked on an incompatible platform (without consulting its list of wheels).
The text was updated successfully, but these errors were encountered: