-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take priority into account within the pending queue #11032
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) soon. Please see the contribution instructions for more information. |
649376e
to
ce441d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is simple but straightforward. I cannot think of anything blocking this from getting merged. Thank you for putting so much time on these benchmarks!
src/cargo/core/compiler/job_queue.rs
Outdated
// If multiple pieces of work are waiting in the pending queue, we can sort it according to | ||
// their priorities: higher priorities should be scheduled sooner. | ||
if self.pending_queue.len() > 1 { | ||
self.pending_queue.sort_by(|(unit_a, _), (unit_b, _)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better with sort_by_cached_key
? Supposed cargo does some hashes here.
(Benchmark needed, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we going to do a sort here anyway, then self.queue.dequeue()
should not guarantee what order they items are returned in. In fact it should probably just return an iterator.
Also if we sort from lowest priority to highest priority, then when remove an item we can use the .pop()
where we are now using self.pending_queue.remove(0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weihanglo Yeah, this is why I also mentioned whether you wanted a different structure where the priority would be part of the data stored in the pending queue.
Ultimately it won't make a huge difference, especially with the allocation involved in caching, but those Unit
s are indeed pretty big and their hashing may not be super cheap, so it should still improve a bit indeed.
I've pushed a change into a sort_by_cached_key
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eh2406 the sort will only apply if there are units in the pending queue here: so IIUC there can be cases (maybe when there are always jobserver tokens available, or at j1) where the units could be processed one by one, in the order in which they are dequeued from the dependency queue. For these cases, we'd still need to keep the max-priority ordering in self.queue.dequeue()
, right ?
I've also pushed your suggestion to pull the high priority items from the end of the queue. (We could probably use the .pop()
as a replacement for the !is_empty()
check at the top of the loop, but I think the let chains to do that are still unstable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Using something like priority queue might be better. It is somehow a bit weird here
that we push the highest priority item from self.queue.dequeue()
, and then immediately sort pending_queue
in ascending order.
Anyway, we can always do it in a follow-up PR. I feel like we can merge it as-is. It would be wonderful if you can do another benchmark before r+. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, what kind of benchmark would you like to see ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular one. The same benchmark you did is good enough. Just want to leave a trace for how much performance gains Cargo benefits from this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somehow a bit weird here that we push the highest priority item from
self.queue.dequeue()
, and then immediately sortpending_queue
in ascending order.
It is, and if we want to keep a Vec for now instead of a max-heap, we can insert the unit directly at the correct index instead.
No particular one. The same benchmark you did is good enough
In the report there are 800 benchmarks (× 7 runs × 2 versions of cargo × 3 parallelism settings × 2 build modes), so re-running everything is not going to be possible soon unfortunately because it takes multiple days. I'll re-run some important ones, on a few different machines, e.g. tokio's mini-redis
(I've been benchmarking that when pushing to this branch today, and it's still showing -9% at j4/6 for example) and some of the most positively and negatively affected crates.
I'll do those 2 things tomorrow.
3f55dd4
to
c685acd
Compare
If multiple pieces of work are waiting in the pending queue, we can sort it according to their priorities: higher priorities should be scheduled sooner. They are more often than not wider than pure chains, and this should create more parallelism opportunities earlier in the pipeline: a high priority piece of work represents more future pieces of work down the line. This is a scheduling tradeoff that behaves differently for each project, machine configuration, amount of available parallelism at a given point in time, etc, but seems to help more often than hinders, at low-core counts and with enough units of work to be done, so that there is jobserver token contention where choosing a "better" piece of work to work on next is possible.
This cleans up the priority-sorted scheduling by removing the need for a priority accessor that would hash the nodes, and allows inserting in the queue at the correctly sorted position to remove the insert + sort combination.
c685acd
to
4ffe98a
Compare
Since we're keeping the pending queue as a vec for now, I've added the priority to its tuples: that removes the need for push/sort/hashing, we can more easily and efficiently insert the items at the correctly sorted position. Let me know if you like it better. @weihanglo here are some benchmarks you asked for as well: I've re-ran some of the highest relative wins and losses from the report, as well as cargo, and tokio's mini-redis (to contrast with the old version of a tokio crate we see in the losses). So here are the hyperfine results of 10 clean runs (with 2 additional warmup runs), for check and debug builds,
for the crates:
(In these results, |
@rfcbot merge This keeps the I feel like this PR just respects the existing priority scheduling mechanism. It doesn't invent anything new, so it might be safe to merge (at least the benchmark result looks generally good). One other thing to enhance is using some data structure other than |
Team member @weihanglo has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
.map(|(key, _)| key.clone()) | ||
.max_by_key(|k| self.priority[k])?; | ||
.map(|(key, _)| (key.clone(), self.priority[key])) | ||
.max_by_key(|(_, priority)| *priority)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This max_by_key
is unnecessary as we are removing all and then sorting. If we change it to .next()
, only the test will need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it if you want ? It doesn't seem to hurt today, and makes the order at least deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got nerd sniped, but it doesn't matter. Leave it bee.
🔔 This is now entering its final comment period, as per the review above. 🔔 |
@bors r+ |
☀️ Test successful - checks-actions |
10 commits in 646e9a0b9ea8354cc409d05f10e8dc752c5de78e..082503982ea0fb7a8fd72210427d43a2e2128a63 2022-09-02 14:29:28 +0000 to 2022-09-13 17:49:38 +0000 - Take priority into account within the pending queue (rust-lang/cargo#11032) - fix(add): Clarify which version the features are added for (rust-lang/cargo#11075) - doc: clarify config-relative paths for `--config <path>` (rust-lang/cargo#11079) - Do not add home bin path to PATH if it's already there (rust-lang/cargo#11023) - Don't use `for` loop on an `Option` (rust-lang/cargo#11081) - Remove dead code (rust-lang/cargo#11080) - Change progress indicator for sparse registries (rust-lang/cargo#11068) - chore(ci): Ensure intradoc links are valid (rust-lang/cargo#11055) - Cache index files based on contents hash (rust-lang/cargo#11044) - fix: specifies the max length for crate name (rust-lang/cargo#11051)
Update cargo 10 commits in 646e9a0b9ea8354cc409d05f10e8dc752c5de78e..082503982ea0fb7a8fd72210427d43a2e2128a63 2022-09-02 14:29:28 +0000 to 2022-09-13 17:49:38 +0000 - Take priority into account within the pending queue (rust-lang/cargo#11032) - fix(add): Clarify which version the features are added for (rust-lang/cargo#11075) - doc: clarify config-relative paths for `--config <path>` (rust-lang/cargo#11079) - Do not add home bin path to PATH if it's already there (rust-lang/cargo#11023) - Don't use `for` loop on an `Option` (rust-lang/cargo#11081) - Remove dead code (rust-lang/cargo#11080) - Change progress indicator for sparse registries (rust-lang/cargo#11068) - chore(ci): Ensure intradoc links are valid (rust-lang/cargo#11055) - Cache index files based on contents hash (rust-lang/cargo#11044) - fix: specifies the max length for crate name (rust-lang/cargo#11051)
How are priorities determined? Can end users influence them? |
Unfortunately, no.
Still naive. Basically, every unit gets a same fixed cost1, and the priority of each is a sum of all the cost of its children plus its own2. We're welcome for more real world data and experiments! Footnotes |
Pkgsrc changes: * We now manage to build for mipsel-unknown-netbsd, but despite the target spec saying cpu = "mips3", the compiler manages to emit 64-bit instructions which cause "illegal instruction" error. Will need more work. The mipsel-unknown-netbsd entry is commentd out since there is no 1.64.0 bootstrap. * Managed to retain the build of aarch64_be, llvm needed a patch to avoid use of neon instructions in the BE case (llvm doesn't support use of neon in BE mode). Ref. patch to src/llvm-project/llvm/lib/Support/BLAKE3/blake3_impl.h. * The minimum gcc version is now 7.x, and that includes the cross-compiler for the targets. For i386 this also needs to /usr/include/gcc-7 include files in the target root, because immintrin.h from gcc 5 is not compatible with gcc 7.x. This applies for the targets where we build against a root from netbsd-8 (sparc64, powerpc, i386), and files/gcc-wrap gets a hack for this. * Pick up tweak for -latomic inclusion from rust-lang/rust#104220 and rust-lang/rust#104572 * Retain ability to do 32-bit NetBSD, by changing from 64 to 32 bit types in library/std/src/sys/unix/thread_parker/netbsd.rs. * I've struggled a bit to get the "openssl-src" build with -latomic where it's needed. I introduce "NetBSD-generic32" system type and use it for the NetBSD mipsel target. There is another attempt to do the same in the patch to vendor/openssl-sys/build/main.rs. * Bump bootstraps to 1.64.0, checksum updates. Upstream changes: Version 1.65.0 (2022-11-03) ========================== Language -------- - [Error on `as` casts of enums with `#[non_exhaustive]` variants] (rust-lang/rust#92744) - [Stabilize `let else`](rust-lang/rust#93628) - [Stabilize generic associated types (GATs)] (rust-lang/rust#96709) - [Add lints `let_underscore_drop`, `let_underscore_lock`, and `let_underscore_must_use` from Clippy] (rust-lang/rust#97739) - [Stabilize `break`ing from arbitrary labeled blocks ("label-break-value")] (rust-lang/rust#99332) - [Uninitialized integers, floats, and raw pointers are now considered immediate UB](rust-lang/rust#98919). Usage of `MaybeUninit` is the correct way to work with uninitialized memory. - [Stabilize raw-dylib for Windows x86_64, aarch64, and thumbv7a] (rust-lang/rust#99916) - [Do not allow `Drop` impl on foreign ADTs] (rust-lang/rust#99576) Compiler -------- - [Stabilize -Csplit-debuginfo on Linux] (rust-lang/rust#98051) - [Use niche-filling optimization even when multiple variants have data] (rust-lang/rust#94075) - [Associated type projections are now verified to be well-formed prior to resolving the underlying type] (rust-lang/rust#99217) - [Stringify non-shorthand visibility correctly] (rust-lang/rust#100350) - [Normalize struct field types when unsizing] (rust-lang/rust#101831) - [Update to LLVM 15](rust-lang/rust#99464) - [Fix aarch64 call abi to correctly zeroext when needed] (rust-lang/rust#97800) - [debuginfo: Generalize C++-like encoding for enums] (rust-lang/rust#98393) - [Add `special_module_name` lint] (rust-lang/rust#94467) - [Add support for generating unique profraw files by default when using `-C instrument-coverage`] (rust-lang/rust#100384) - [Allow dynamic linking for iOS/tvOS targets] (rust-lang/rust#100636) New targets: - [Add armv4t-none-eabi as a tier 3 target] (rust-lang/rust#100244) - [Add powerpc64-unknown-openbsd and riscv64-unknown-openbsd as tier 3 targets] (rust-lang/rust#101025) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Don't generate `PartialEq::ne` in derive(PartialEq)] (rust-lang/rust#98655) - [Windows RNG: Use `BCRYPT_RNG_ALG_HANDLE` by default] (rust-lang/rust#101325) - [Forbid mixing `System` with direct system allocator calls] (rust-lang/rust#101394) - [Document no support for writing to non-blocking stdio/stderr] (rust-lang/rust#101416) - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`](rust-lang/rust#95295) This also changes the safety conditions on `Layout::from_size_align_unchecked`. Stabilized APIs --------------- - [`std::backtrace::Backtrace`] (https://doc.rust-lang.org/stable/std/backtrace/struct.Backtrace.html) - [`Bound::as_ref`] (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.as_ref) - [`std::io::read_to_string`] (https://doc.rust-lang.org/stable/std/io/fn.read_to_string.html) - [`<*const T>::cast_mut`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_mut) - [`<*mut T>::cast_const`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_const) These APIs are now stable in const contexts: - [`<*const T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) - [`<*mut T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) Cargo ----- - [Apply GitHub fast path even for partial hashes] (rust-lang/cargo#10807) - [Do not add home bin path to PATH if it's already there] (rust-lang/cargo#11023) - [Take priority into account within the pending queue] (rust-lang/cargo#11032). This slightly optimizes job scheduling by Cargo, with typically small improvements on larger crate graph builds. Compatibility Notes ------------------- - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`] (rust-lang/rust#95295). This also changes the safety conditions on `Layout::from_size_align_unchecked`. - [`PollFn` now only implements `Unpin` if the closure is `Unpin`] (rust-lang/rust#102737). This is a possible breaking change if users were relying on the blanket unpin implementation. See discussion on the PR for details of why this change was made. - [Drop ExactSizeIterator impl from std::char::EscapeAscii] (rust-lang/rust#99880) This is a backwards-incompatible change to the standard library's surface area, but is unlikely to affect real world usage. - [Do not consider a single repeated lifetime eligible for elision in the return type] (rust-lang/rust#103450) This behavior was unintentionally changed in 1.64.0, and this release reverts that change by making this an error again. - [Reenable disabled early syntax gates as future-incompatibility lints] (rust-lang/rust#99935) - [Update the minimum external LLVM to 13] (rust-lang/rust#100460) - [Don't duplicate file descriptors into stdio fds] (rust-lang/rust#101426) - [Sunset RLS](rust-lang/rust#100863) - [Deny usage of `#![cfg_attr(..., crate_type = ...)]` to set the crate type] (rust-lang/rust#99784) This strengthens the forward compatibility lint deprecated_cfg_attr_crate_type_name to deny. - [`llvm-has-rust-patches` allows setting the build system to treat the LLVM as having Rust-specific patches] (rust-lang/rust#101072) This option may need to be set for distributions that are building Rust with a patched LLVM via `llvm-config`, not the built-in LLVM. Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Add `x.sh` and `x.ps1` shell scripts] (rust-lang/rust#99992) - [compiletest: use target cfg instead of hard-coded tables] (rust-lang/rust#100260) - [Use object instead of LLVM for reading bitcode from rlibs] (rust-lang/rust#98100) - [Enable MIR inlining for optimized compilations] (rust-lang/rust#91743) This provides a 3-10% improvement in compiletimes for real world crates. See [perf results] (https://perf.rust-lang.org/compare.html?start=aedf78e56b2279cc869962feac5153b6ba7001ed&end=0075bb4fad68e64b6d1be06bf2db366c30bc75e1&stat=instructions:u).
Pkgsrc changes: * pkglint cleanups, bump bootstrap kits to 1.65.0. * New target: mipsel-unknown-netbsd, for cpu=mips32 with soft-float. * Managed to retain the build of aarch64_be, llvm needed a patch to avoid use of neon instructions in the BE case (llvm doesn't support use of neon in BE mode). Ref. patch to src/llvm-project/llvm/lib/Support/BLAKE3/blake3_impl.h. Also submitted upstream of LLVM to the BLAKE3 maintainers. * The minimum gcc version is now 7.x, and that includes the cross-compiler for the targets. For i386 this also needs to /usr/include/gcc-7 include files in the target root, because immintrin.h from gcc 5 is not compatible with gcc 7.x. This applies for the targets where we build against a root from netbsd-8 (sparc64, powerpc, i386), and files/gcc-wrap gets a hack for this. * Pick up tweak for -latomic inclusion from rust-lang/rust#104220 and rust-lang/rust#104572 * Retain ability to do 32-bit NetBSD, by changing from 64 to 32 bit types in library/std/src/sys/unix/thread_parker/netbsd.rs. * I've tried to get the "openssl-src" build with -latomic where it's needed. I've introduced the "NetBSD-generic32" system type and use it for the NetBSD mipsel target. There is another attempt to do the same in the patch to vendor/openssl-sys/build/main.rs. Upstream changes: Version 1.66.1 (2023-01-10) =========================== - Added validation of SSH host keys for git URLs in Cargo ([CVE-2022-46176](https://www.cve.org/CVERecord?id=CVE-2022-46176)) Version 1.66.0 (2022-12-15) =========================== Language -------- - [Permit specifying explicit discriminants on all `repr(Int)` enums](rust-lang/rust#95710) ```rust #[repr(u8)] enum Foo { A(u8) = 0, B(i8) = 1, C(bool) = 42, } ``` - [Allow transmutes between the same type differing only in lifetimes](rust-lang/rust#101520) - [Change constant evaluation errors from a deny-by-default lint to a hard error](rust-lang/rust#102091) - [Trigger `must_use` on `impl Trait` for supertraits](rust-lang/rust#102287) This makes `impl ExactSizeIterator` respect the existing `#[must_use]` annotation on `Iterator`. - [Allow `..X` and `..=X` in patterns](rust-lang/rust#102275) - [Uplift `clippy::for_loops_over_fallibles` lint into rustc](rust-lang/rust#99696) - [Stabilize `sym` operands in inline assembly](rust-lang/rust#103168) - [Update to Unicode 15](rust-lang/rust#101912) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. Compiler -------- - [Add armv5te-none-eabi and thumbv5te-none-eabi tier 3 targets](rust-lang/rust#101329) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. - [Add support for linking against macOS universal libraries](rust-lang/rust#98736) Libraries --------- - [Fix `#[derive(Default)]` on a generic `#[default]` enum adding unnecessary `Default` bounds](rust-lang/rust#101040) - [Update to Unicode 15](rust-lang/rust#101821) Stabilized APIs --------------- - [`proc_macro::Span::source_text`](https://doc.rust-lang.org/stable/proc_macro/struct.Span.html#method.source_text) - [`uX::{checked_add_signed, overflowing_add_signed, saturating_add_signed, wrapping_add_signed}`](https://doc.rust-lang.org/stable/std/primitive.u8.html#method.checked_add_signed) - [`iX::{checked_add_unsigned, overflowing_add_unsigned, saturating_add_unsigned, wrapping_add_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_add_unsigned) - [`iX::{checked_sub_unsigned, overflowing_sub_unsigned, saturating_sub_unsigned, wrapping_sub_unsigned}`](https://doc.rust-lang.org/stable/std/primitive.i8.html#method.checked_sub_unsigned) - [`BTreeSet::{first, last, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeSet.html#method.first) - [`BTreeMap::{first_key_value, last_key_value, first_entry, last_entry, pop_first, pop_last}`](https://doc.rust-lang.org/stable/std/collections/struct.BTreeMap.html#method.first_key_value) - [Add `AsFd` implementations for stdio lock types on WASI.](rust-lang/rust#101768) - [`impl TryFrom<Vec<T>> for Box<[T; N]>`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#impl-TryFrom%3CVec%3CT%2C%20Global%3E%3E-for-Box%3C%5BT%3B%20N%5D%2C%20Global%3E) - [`core::hint::black_box`](https://doc.rust-lang.org/stable/std/hint/fn.black_box.html) - [`Duration::try_from_secs_{f32,f64}`](https://doc.rust-lang.org/stable/std/time/struct.Duration.html#method.try_from_secs_f32) - [`Option::unzip`](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.unzip) - [`std::os::fd`](https://doc.rust-lang.org/stable/std/os/fd/index.html) Rustdoc ------- - [Add Rustdoc warning for invalid HTML tags in the documentation](rust-lang/rust#101720) Cargo ----- - [Added `cargo remove` to remove dependencies from Cargo.toml](https://doc.rust-lang.org/nightly/cargo/commands/cargo-remove.html) - [`cargo publish` now waits for the new version to be downloadable before exiting](rust-lang/cargo#11062) See [detailed release notes](https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-166-2022-12-15) for more. Compatibility Notes ------------------- - [Only apply `ProceduralMasquerade` hack to older versions of `rental`](rust-lang/rust#94063) - [Don't export `__heap_base` and `__data_end` on wasm32-wasi.](rust-lang/rust#102385) - [Don't export `__wasm_init_memory` on WebAssembly.](rust-lang/rust#102426) - [Only export `__tls_*` on wasm32-unknown-unknown.](rust-lang/rust#102440) - [Don't link to `libresolv` in libstd on Darwin](rust-lang/rust#102766) - [Update libstd's libc to 0.2.135 (to make `libstd` no longer pull in `libiconv.dylib` on Darwin)](rust-lang/rust#103277) - [Opaque types no longer imply lifetime bounds](rust-lang/rust#95474) This is a soundness fix which may break code that was erroneously relying on this behavior. - [Make `order_dependent_trait_objects` show up in future-breakage reports](rust-lang/rust#102635) - [Change std::process::Command spawning to default to inheriting the parent's signal mask](rust-lang/rust#101077) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Enable BOLT for LLVM compilation](rust-lang/rust#94381) - [Enable LTO for rustc_driver.so](rust-lang/rust#101403) Version 1.65.0 (2022-11-03) ========================== Language -------- - [Error on `as` casts of enums with `#[non_exhaustive]` variants] (rust-lang/rust#92744) - [Stabilize `let else`](rust-lang/rust#93628) - [Stabilize generic associated types (GATs)] (rust-lang/rust#96709) - [Add lints `let_underscore_drop`, `let_underscore_lock`, and `let_underscore_must_use` from Clippy] (rust-lang/rust#97739) - [Stabilize `break`ing from arbitrary labeled blocks ("label-break-value")] (rust-lang/rust#99332) - [Uninitialized integers, floats, and raw pointers are now considered immediate UB](rust-lang/rust#98919). Usage of `MaybeUninit` is the correct way to work with uninitialized memory. - [Stabilize raw-dylib for Windows x86_64, aarch64, and thumbv7a] (rust-lang/rust#99916) - [Do not allow `Drop` impl on foreign ADTs] (rust-lang/rust#99576) Compiler -------- - [Stabilize -Csplit-debuginfo on Linux] (rust-lang/rust#98051) - [Use niche-filling optimization even when multiple variants have data] (rust-lang/rust#94075) - [Associated type projections are now verified to be well-formed prior to resolving the underlying type] (rust-lang/rust#99217) - [Stringify non-shorthand visibility correctly] (rust-lang/rust#100350) - [Normalize struct field types when unsizing] (rust-lang/rust#101831) - [Update to LLVM 15](rust-lang/rust#99464) - [Fix aarch64 call abi to correctly zeroext when needed] (rust-lang/rust#97800) - [debuginfo: Generalize C++-like encoding for enums] (rust-lang/rust#98393) - [Add `special_module_name` lint] (rust-lang/rust#94467) - [Add support for generating unique profraw files by default when using `-C instrument-coverage`] (rust-lang/rust#100384) - [Allow dynamic linking for iOS/tvOS targets] (rust-lang/rust#100636) New targets: - [Add armv4t-none-eabi as a tier 3 target] (rust-lang/rust#100244) - [Add powerpc64-unknown-openbsd and riscv64-unknown-openbsd as tier 3 targets] (rust-lang/rust#101025) - Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Don't generate `PartialEq::ne` in derive(PartialEq)] (rust-lang/rust#98655) - [Windows RNG: Use `BCRYPT_RNG_ALG_HANDLE` by default] (rust-lang/rust#101325) - [Forbid mixing `System` with direct system allocator calls] (rust-lang/rust#101394) - [Document no support for writing to non-blocking stdio/stderr] (rust-lang/rust#101416) - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`](rust-lang/rust#95295) This also changes the safety conditions on `Layout::from_size_align_unchecked`. Stabilized APIs --------------- - [`std::backtrace::Backtrace`] (https://doc.rust-lang.org/stable/std/backtrace/struct.Backtrace.html) - [`Bound::as_ref`] (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.as_ref) - [`std::io::read_to_string`] (https://doc.rust-lang.org/stable/std/io/fn.read_to_string.html) - [`<*const T>::cast_mut`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_mut) - [`<*mut T>::cast_const`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.cast_const) These APIs are now stable in const contexts: - [`<*const T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) - [`<*mut T>::offset_from`] (https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.offset_from) Cargo ----- - [Apply GitHub fast path even for partial hashes] (rust-lang/cargo#10807) - [Do not add home bin path to PATH if it's already there] (rust-lang/cargo#11023) - [Take priority into account within the pending queue] (rust-lang/cargo#11032). This slightly optimizes job scheduling by Cargo, with typically small improvements on larger crate graph builds. Compatibility Notes ------------------- - [`std::layout::Layout` size must not overflow `isize::MAX` when rounded up to `align`] (rust-lang/rust#95295). This also changes the safety conditions on `Layout::from_size_align_unchecked`. - [`PollFn` now only implements `Unpin` if the closure is `Unpin`] (rust-lang/rust#102737). This is a possible breaking change if users were relying on the blanket unpin implementation. See discussion on the PR for details of why this change was made. - [Drop ExactSizeIterator impl from std::char::EscapeAscii] (rust-lang/rust#99880) This is a backwards-incompatible change to the standard library's surface area, but is unlikely to affect real world usage. - [Do not consider a single repeated lifetime eligible for elision in the return type] (rust-lang/rust#103450) This behavior was unintentionally changed in 1.64.0, and this release reverts that change by making this an error again. - [Reenable disabled early syntax gates as future-incompatibility lints] (rust-lang/rust#99935) - [Update the minimum external LLVM to 13] (rust-lang/rust#100460) - [Don't duplicate file descriptors into stdio fds] (rust-lang/rust#101426) - [Sunset RLS](rust-lang/rust#100863) - [Deny usage of `#![cfg_attr(..., crate_type = ...)]` to set the crate type] (rust-lang/rust#99784) This strengthens the forward compatibility lint deprecated_cfg_attr_crate_type_name to deny. - [`llvm-has-rust-patches` allows setting the build system to treat the LLVM as having Rust-specific patches] (rust-lang/rust#101072) This option may need to be set for distributions that are building Rust with a patched LLVM via `llvm-config`, not the built-in LLVM. Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Add `x.sh` and `x.ps1` shell scripts] (rust-lang/rust#99992) - [compiletest: use target cfg instead of hard-coded tables] (rust-lang/rust#100260) - [Use object instead of LLVM for reading bitcode from rlibs] (rust-lang/rust#98100) - [Enable MIR inlining for optimized compilations] (rust-lang/rust#91743) This provides a 3-10% improvement in compiletimes for real world crates. See [perf results] (https://perf.rust-lang.org/compare.html?start=aedf78e56b2279cc869962feac5153b6ba7001ed&end=0075bb4fad68e64b6d1be06bf2db366c30bc75e1&stat=instructions:u).
Update cargo 10 commits in 646e9a0b9ea8354cc409d05f10e8dc752c5de78e..082503982ea0fb7a8fd72210427d43a2e2128a63 2022-09-02 14:29:28 +0000 to 2022-09-13 17:49:38 +0000 - Take priority into account within the pending queue (rust-lang/cargo#11032) - fix(add): Clarify which version the features are added for (rust-lang/cargo#11075) - doc: clarify config-relative paths for `--config <path>` (rust-lang/cargo#11079) - Do not add home bin path to PATH if it's already there (rust-lang/cargo#11023) - Don't use `for` loop on an `Option` (rust-lang/cargo#11081) - Remove dead code (rust-lang/cargo#11080) - Change progress indicator for sparse registries (rust-lang/cargo#11068) - chore(ci): Ensure intradoc links are valid (rust-lang/cargo#11055) - Cache index files based on contents hash (rust-lang/cargo#11044) - fix: specifies the max length for crate name (rust-lang/cargo#11051)
Update cargo 10 commits in 646e9a0b9ea8354cc409d05f10e8dc752c5de78e..082503982ea0fb7a8fd72210427d43a2e2128a63 2022-09-02 14:29:28 +0000 to 2022-09-13 17:49:38 +0000 - Take priority into account within the pending queue (rust-lang/cargo#11032) - fix(add): Clarify which version the features are added for (rust-lang/cargo#11075) - doc: clarify config-relative paths for `--config <path>` (rust-lang/cargo#11079) - Do not add home bin path to PATH if it's already there (rust-lang/cargo#11023) - Don't use `for` loop on an `Option` (rust-lang/cargo#11081) - Remove dead code (rust-lang/cargo#11080) - Change progress indicator for sparse registries (rust-lang/cargo#11068) - chore(ci): Ensure intradoc links are valid (rust-lang/cargo#11055) - Cache index files based on contents hash (rust-lang/cargo#11044) - fix: specifies the max length for crate name (rust-lang/cargo#11051)
This is the PR for the work discussed in this zulip thread and whose detailed description and some results are available here with graphs, summaries and raw data -- much of which was shown in the thread as well.
Units of works have a computed priority that is used in the dependency queue, so that higher priorities are dequeued sooner, as documented here.
This PR further applies that principle to the next step before being executed: if multiple pieces of work are waiting in the pending queue, we can sort that according to their priorities. Here as well, higher priorities should be scheduled sooner.
They are more often than not wider than pure chains of dependencies, and this should create more parallelism opportunities earlier in the pipeline: a high priority piece of work represents more future pieces of work down the line, and try to sustain CPU utilization longer (at the potential cost of this parallelism being distributed differently than today, between cargo invoking rustc and rustc's own codegen threads -- when applicable).
This is a scheduling tradeoff that behaves differently for each project, machine configuration, amount of available parallelism at a given point in time, etc, but seems to help more often than hinders: at low-core counts and with enough units of work to be done, so that there is jobserver token contention where choosing a "better" piece of work to work on next may be possible.
There's of course a bit of noise in the results linked above and 800 or so of the most popular crates.io crates is still a limited sample, but they're mostly meant to show a hopefully positive trend: while there are improvements and regressions, that trend looks to be more positive than negative, with the wins being more numerous and with higher amplitudes than the corresponding losses.
(A report on another scheduling experiment -- a superset of this PR, where I also simulate users manually giving a higher priority to
syn
,quote
,serde_derive
-- is available here and also improves this PR's results: the regressions are decreased in number and amplitude, whereas the improvements are bigger and more numerous. So that could be further work to iterate upon this one)Since this mostly applies to clean builds, for low core counts, and with a sufficient number of dependencies to have some items in the pending queue, I feel this also applies well to CI use-cases (esp. on the free tiers).
Somewhat reassuring as well, and discussed in the thread but not in the report: I've also tried to make sure cargo and bootstrapping rustc are not negatively affected. cargo saw some improvements, and bootstrap stayed within today's variance of +/- 2 to 3%. Similarly, since 3y old versions of some tokio crates (
0.2.0-alpha.1
) were the most negatively affected, I've also checked that recent tokio versions (1.19
) are not disproportionately impacted: their simple readme example, the more idiomaticmini-redis
sample, and some of my friends' tokio projects were either unaffected or saw some interesting improvements.And here's a
cargo check -j2
graph to liven up this wall of text:I'm not a cargo expert so I'm not sure whether it would be preferable to integrate priorities deeper than just the dependency queue, and e.g. have
Unit
s contain a dedicated field or similar. So in the meantime I've done the simplest thing: just sort the pending queue and ask the units' priorities to the dep queue.We could just as well have the priority recorded as part of the pending queue tuples themselves, or have that be some kind of priority queue/max heap instead of a Vec.
Let me know which you prefer, but it's in essence a very simple change as-is.