-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable LTO optimizations in release builds to reduce binary size #5904
Conversation
CodSpeed Performance ReportMerging #5904 will improve performances by 14.9%Comparing Summary
Benchmarks breakdown
|
I'm seeing a small speedup, too:
|
Can you also share final released wheels size (compressed)? |
Absolutely! Currently struggling to find how to build those (can't find any job where that is done and the release-pypi ci eludes to |
You can use |
Thanks @konstin! @T-256 Here are your results (a bit underwhelming actually): Before:
After:
Btw, thanks for sharing the discussion about this in the ruff repository! If necessary, I can spend some time playing around a bit with build optimizations to make it take less time to compile release builds. Build times did take a massive hit (at least on my machine, from ~1m to ~3m on a fresh build). Just through a quick read of discussion you linked and the other ones linked by other people, it seems like there is a lot more to consider here than I initially thought. |
I'm supportive of this. Saving on binary size is a win for the registry, CDN, and users. We can always revisit in the future. |
I would say I'm somewhat mildly negative on this change, largely for the same reasons I discussed here for Building With that said, I'm already using I'm not strongly opposed to this change though. So I fine with keeping it for now and evaluating just how much the longer build times annoy us. |
First of all, thanks for joining into the conversation @BurntSushi. Your comments on astral-sh/ruff#9224 and similar were insanely insightful!
The I have also opened #5909 to try to further optimize binary size without giving up too much performance nor build times :) |
Oof. Right. That needs to be undone. It is not tenable to wait minutes between benchmarking runs IMO. (The compile times are already too long.) |
I'm sorry for the build slowdowns! Didn't initially think they were done for anything else than a release, where I thought it would be fine to give up some build time to reduce the final binary. If #5909 is not good enough (from a clean build, it now only takes about 30s more than before and 1m 45s less than with LTO only, at a slight performance cost that is), I would understand this PR being reverted |
I don't think we need to revert the PR. I think we just need to make the |
This PR tweaks the change made in #5904 so that the `profiling` Cargo profile does _not_ have LTO enabled. With LTO enabled, compile times even after just doing a `touch crates/uv/src/bin/uv.rs` are devastating: $ cargo b --profile profiling -p uv Compiling uv-cli v0.0.1 (/home/andrew/astral/uv/crates/uv-cli) Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 3m 47s Even with `lto = "thin"`, compile times are not great, but an improvement: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 53.98s But our original configuration for `profiling`, prior to #5904, was with LTO completely disabled: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 30.09s This gives reasonable-ish compile times, although I still want them to be better. This setup does risk that we are measuring something in benchmarks that we are shipping, but in order to make those two the same, we'd either need to make compile times way worse for development, or take a hit to binary size and a slight hit to runtime performance in our release builds. I would weakly prefer that we accept the hit to runtime performance and binary size in order to bring our measurements in line with what we ship, but I _strongly_ feel that we should not have compile times exceeding minutes for development. When doing performance testing, long compile times, for me anyway, break "flow" state. A confounding factor here was that #5904 enabled LTO for the `release` profile, but the `dist` profile (used by `cargo dist`) was still setting it to `lto = "thin"`. However, because of shenanigans in our release pipeline, we we actually using the `release` profile for binaries we ship. This PR does not make any changes here other than to remove `lto = "thin"` from the `dist` profile to make the fact that they are the same a bit clearer. cc @davfsa
This PR tweaks the change made in #5904 so that the `profiling` Cargo profile does _not_ have LTO enabled. With LTO enabled, compile times even after just doing a `touch crates/uv/src/bin/uv.rs` are devastating: $ cargo b --profile profiling -p uv Compiling uv-cli v0.0.1 (/home/andrew/astral/uv/crates/uv-cli) Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 3m 47s Even with `lto = "thin"`, compile times are not great, but an improvement: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 53.98s But our original configuration for `profiling`, prior to #5904, was with LTO completely disabled: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 30.09s This gives reasonable-ish compile times, although I still want them to be better. This setup does risk that we are measuring something in benchmarks that we are shipping, but in order to make those two the same, we'd either need to make compile times way worse for development, or take a hit to binary size and a slight hit to runtime performance in our release builds. I would weakly prefer that we accept the hit to runtime performance and binary size in order to bring our measurements in line with what we ship, but I _strongly_ feel that we should not have compile times exceeding minutes for development. When doing performance testing, long compile times, for me anyway, break "flow" state. A confounding factor here was that #5904 enabled LTO for the `release` profile, but the `dist` profile (used by `cargo dist`) was still setting it to `lto = "thin"`. However, because of shenanigans in our release pipeline, we we actually using the `release` profile for binaries we ship. This PR does not make any changes here other than to remove `lto = "thin"` from the `dist` profile to make the fact that they are the same a bit clearer. cc @davfsa
This PR tweaks the change made in #5904 so that the `profiling` Cargo profile does _not_ have LTO enabled. With LTO enabled, compile times even after just doing a `touch crates/uv/src/bin/uv.rs` are devastating: $ cargo b --profile profiling -p uv Compiling uv-cli v0.0.1 (/home/andrew/astral/uv/crates/uv-cli) Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 3m 47s Even with `lto = "thin"`, compile times are not great, but an improvement: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 53.98s But our original configuration for `profiling`, prior to #5904, was with LTO completely disabled: $ cargo b --profile profiling -p uv Compiling uv v0.2.34 (/home/andrew/astral/uv/crates/uv) Finished `profiling` profile [optimized + debuginfo] target(s) in 30.09s This gives reasonable-ish compile times, although I still want them to be better. This setup does risk that we are measuring something in benchmarks that we are shipping, but in order to make those two the same, we'd either need to make compile times way worse for development, or take a hit to binary size and a slight hit to runtime performance in our release builds. I would weakly prefer that we accept the hit to runtime performance and binary size in order to bring our measurements in line with what we ship, but I _strongly_ feel that we should not have compile times exceeding minutes for development. When doing performance testing, long compile times, for me anyway, break "flow" state. A confounding factor here was that #5904 enabled LTO for the `release` profile, but the `dist` profile (used by `cargo dist`) was still setting it to `lto = "thin"`. However, because of shenanigans in our release pipeline, we we actually using the `release` profile for binaries we ship. This PR does not make any changes here other than to remove `lto = "thin"` from the `dist` profile to make the fact that they are the same a bit clearer. cc @davfsa
This change was not tested for all of our release builds unfortunately, it's failing in #5984 — considering reverting to unblock the release and revisiting. |
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.2.33` -> `0.2.35` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.2.35`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0235) [Compare Source](astral-sh/uv@0.2.34...0.2.35) ##### CLI - Deprecate `--system` and `--no-system` in `uv venv` ([#​5925](astral-sh/uv#5925)) - Make `--upgrade` imply `--refresh` ([#​5943](astral-sh/uv#5943)) - Warn when there are missing bounds on transitive dependencies with `--resolution-strategy lowest` ([#​5953](astral-sh/uv#5953)) ##### Configuration - Add support for `no-build-isolation-package` ([#​5894](astral-sh/uv#5894)) ##### Performance - Enable LTO optimizations in release builds to reduce binary size ([#​5904](astral-sh/uv#5904)) - Prefetch metadata in `--no-deps` mode ([#​5918](astral-sh/uv#5918)) ##### Bug fixes - Display portable paths in POSIX virtual environment activation commands ([#​5956](astral-sh/uv#5956)) - Respect subdirectories when locating Git workspaces ([#​5944](astral-sh/uv#5944)) ##### Documentation - Improve the `uv venv` CLI documentation ([#​5963](astral-sh/uv#5963)) ### [`v0.2.34`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0234) [Compare Source](astral-sh/uv@0.2.33...0.2.34) ##### Enhancements - Always strip in release mode ([#​5745](astral-sh/uv#5745)) - Assume `git+` prefix when URLs end in `.git` ([#​5868](astral-sh/uv#5868)) - Support build constraints ([#​5639](astral-sh/uv#5639)) ##### CLI - Create help sections for build, install, resolve, and index ([#​5693](astral-sh/uv#5693)) - Improve CLI documentation for global options ([#​5834](astral-sh/uv#5834)) - Improve `--python` CLI documentation ([#​5869](astral-sh/uv#5869)) - Improve display order of top-level commands ([#​5830](astral-sh/uv#5830)) ##### Bug fixes - Allow downloading wheels for metadata with `--no-binary` ([#​5707](astral-sh/uv#5707)) - Reject `pyproject.toml` in `--config-file` ([#​5842](astral-sh/uv#5842)) - Remove double-proxy nodes in error reporting ([#​5738](astral-sh/uv#5738)) - Respect pre-release preferences from input files ([#​5736](astral-sh/uv#5736)) - Support overlapping local and non-local requirements in forks ([#​5812](astral-sh/uv#5812)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Summary
In the same spirit as #5745, release builds could be a bit slightly more size efficient by enabling LTO, which removes dead code (either in uv through fully inlined functions or the libraries it depends on). Also has the side-effect (more what LTO was created for) of slighly speeding up uv.
In this case, I have measured a 5MB size decrease!.
Unfortunately, this change also comes with the disadvantage of more than doubling the build time of a clean build on my machine (see "Test Plan"). I have opened this pull request to show my findings and suggest this as an option.
I have also started looking into what effects optimizing for size rather than speed could have, but that calls for another pr
Test Plan
Comparing the binary size before and after (starting off in just a simple clone of the repository)
System info:
Before:
After: