-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse Timsort scan direction #107191
Reverse Timsort scan direction #107191
Conversation
This is more clear about the intent of the pointer and avoids problems if the allocation returns a null pointer.
Avoid duplicate insertion sort implementations. Optimize implementations.
Memory pre-fetching prefers forward scanning vs backwards scanning, and the code-gen is usually better. For the most sensitive types such as integers, these are planned to be merged bidirectionally at once. So there is no benefit in scanning backwards. The largest perf gains are seen for full ascending and descending inputs, which see 1.5x speedups. Random inputs benefit too, and some patterns can loose out, but these losses are minimal.
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
This comment has been minimized.
This comment has been minimized.
r? thomcc |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit f297afa with merge ff841929044c3390745330612525de1a62492383... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ff841929044c3390745330612525de1a62492383): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
@thomcc looking at the regressions and improvements specific to this PR, I get the impression there is no clear win or loss here. Also it seems the magnitude of change is rather small. But I'm not familiar with these benchmarks and their significance, I'd like to hear your impression. It should also be said that these changes are mostly of setup nature, and the next PR plans to introduce the first chunk of larger speedups. |
@Voultapher Not sure. It could be noise, but it looks like the regressions are more significant than the improvements. Note that there are 4 primary benchmarks regressed vs 1 secondary (e.g. synthetic) benchmark which improved. I'll try more runs in case it's noise, but it's worth investigating. @bors try @rust-timer queue runs=5 |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 5eff264 with merge 82bc25cb7e57ff1e01bdc1a76269d8c5ff08d2f3... |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (82bc25cb7e57ff1e01bdc1a76269d8c5ff08d2f3): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
@rustbot author |
@thomcc looking at the two runs, the second one has one primary improvement of 0.5% and one primary regression of 0.5% in the same crate, as well as one further regression of 0.3% in the same crate. The bootstrap timings look all over the place. |
@rustbot ready |
Hm, fair enough (to be clear: my pickiness here is just to ensure we don't land optimizations that are actually pessimizations, I think the change is good in general). (I'll do my review this weekend) |
One open question is, how much sort performance even influences compiler performance. As IIUC this benchmark suite is focused on compiler performance only. |
It is. The compiler definitely performs sorts though, and it wouldn't surprise me if some were in sensitive positions. |
Okay, it took a jillion years but I am convinced of this code's correctness. Thanks. @bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (96834f0): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Regressions are small enough that I think we don't need to investigate this closely. @rustbot label: perf-regression-triaged |
… r=Mark-Simulacrum Fix no_global_oom_handling build `provide_sorted_batch` in core is incorrectly marked with `#[cfg(not(no_global_oom_handling))]` which prevents core from building with the cfg enabled. Nothing in `core` allocates memory (including this function). The `cfg` gate is incorrect. cc `@dpaoliello` r? `@wesleywiser` The cfg was added by rust-lang#107191
… r=Mark-Simulacrum Fix no_global_oom_handling build `provide_sorted_batch` in core is incorrectly marked with `#[cfg(not(no_global_oom_handling))]` which prevents core from building with the cfg enabled. Nothing in `core` allocates memory (including this function). The `cfg` gate is incorrect. cc ``@dpaoliello`` r? ``@wesleywiser`` The cfg was added by rust-lang#107191
Another PR in the series of stable sort improvements. Best reviewed by looking at the individual commits.
The main perf gain here is for fully ascending (sorted) or reversed inputs for cheap to compare types such as
u64
, these see a ~1.5x speedup.Types such as string with indirect pre-fetching see only minor changes. Further speedups are planned in future PRs so, I wouldn't spend too much time for benchmarks here.