-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large amount of time spent in "zipping" when building a pex from a --pex-repository
#1675
Comments
Why is layout zipapp? The packed layout was built to minimize zip time / cache hits, etc. |
Ok, assuming there is some reason for needing zipapp, 1st seeing what it takes for native tools:
I think I can end there even with a lack of apples to apples on your timing measurement. Creating a zip that big from that many loose files simply takes that long. I'll close this as an answered question, but please feel free to re-open and explain more what you're after if you're, for example, looking for Pex to experiment or expose compression levels (this experiment used defaults). |
Because it's an externally packaged app (the |
Ok. I have never invested time thinking about re-thinking zipping. There may be perf to be squeezed there, but I'm honestly ignorant. IIUC each entry is seperately compressed which implies entries could be prepared in parallel, but it's unclear to me if this is feasible, worthwhile, hard, etc. |
I expect that But with regard to making the |
Yeah, I'm pretty loathe to introduce external dependencies, but perhaps a probe for Or, maybe just expose compression level. I try 7zip here, 1st trying default, then multithreaded #cores with BZip2 compression (which it says is the only compression it will parallelize), then no compression, and finally zip with no compression:
Since no compression takes more then an order of magnitude less time, perhaps that's enough and folks can decide to trade size for speed now, and suck up the speed cost on the netwrok transfer later, if there is one. |
Since the performance tradeoff was so drastic in these experiments and exposing compression level is a pretty easy to do, forked #1686 to track doing that. |
#2175 addresses this with an external dependency, which we can discuss once I can get it to pass mypy. I have been able to build a wheel for interpreter versions 3.7-3.11, impls, cpython and pypy, and platforms manylinux, musllinux, macos (x86 and arm), and even windows. But figuring out how to integrate that into pex without losing the utility of the universal py-only pex is obviously a discussion to have. |
@cosmicexplorer did you consider / try out @stuhood's comment?:
IOW, have Pex try |
That's a great idea!! Especially since the biggest perf gain by miles wasn't parallelizing but rather the caching enabled by the merge operation! |
Oh, I'm going to try that right now. |
Essentially, the approach in #2175 gives |
For the attached lock.txt (superset) and subset.txt requirements, building a PEX using
--pex-repository
(on an AWS p2x instance inside of Docker) takes ~220s, primarily inside the "zipping" phase:The full PEX command to build the subset is:
The text was updated successfully, but these errors were encountered: