RangeInclusive iteration performance improvement. #57378

matthieu-m · 2019-01-06T15:49:18Z

The current implementation of Iterator::{next, next_back} for
RangeInclusive leads to sub-optimal performance of loops as LLVM is not
capable of splitting the loop into a first-pass initialization
(computing is_empty) followed by the actual loop. This results in each
iteration performing two conditional jumps, which not only impacts the
performance of unoptimized loops, but also inhibits unrolling and
vectorization.

The proposed implementation switches things around, performing extra
work only on the last iteration of the loop. This results in even
unoptimized loops performing a single conditional jump in all but the
last iteration, matching Range's performance, as well as letting LLVM
unroll and vectorize when it would do so for Range's loop.

As a result, it should make iterating on inclusive ranges as fast as
iterating on exclusive ones; avoiding a papercut performance pitfall.

Unfortunately, it also appears to foil LLVM Loop Splitting optimization.

rust-highfive · 2019-01-06T15:49:21Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @rkruppe (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

rust-highfive · 2019-01-06T16:58:41Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:3653fa10:start=1546789846190420746,finish=1546789915198335434,duration=69007914688
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

matthieu-m · 2019-01-06T17:16:57Z

The failure seems legitimate, LLVM apparently fails to constant-fold the loop over a RangeInclusive in test/codegen/issue-45222.rs.

I've reproduced the issue on the playground (https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=c6ad080bf6386dab551c2bed1ad6dbfb); so I'll have to fiddle with this to understand what is blocking LLVM.

ranma42 · 2019-01-06T20:02:36Z

This is somewhat surprising, given that

fn foo3c(n: u64) -> u64 {
    let mut count = 0;
    (0..n).for_each(|_| {
        (0..n).chain(::std::iter::once(n)).rev().for_each(|j| {
            count += j;
        })
    });
    count
}

constant-folds just fine 🤔

matthieu-m · 2019-01-08T19:22:24Z

Alright, let's go nuts: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=c23c205c5f6dcdfeb958c2a6cf83ecdb .

// Doesn't const-fold (NAY)
fn triangle_inc_chain(n: u64) -> u64 {
    let mut count = 0;
    for j in (0..n).chain(::std::iter::once(n)) {
        count += j;
    }
    count
}

// Does const-fold (YAY)
fn triangle_inc_chain_foreach(n: u64) -> u64 {
    let mut count = 0;
    (0..n).chain(::std::iter::once(n)).for_each(|j| count += j);
    count
}

At this point, I'm really wondering what trips up LLVM.

Note: use of explicit for vs for_each doesn't impact the FixedRangeInclusive, that would be too simple.

@rkruppe : I am thinking that this test, as written, is bad. Whether LLVM const-fold or not seems to have no relation to the "tightness" of the generated LLVM IR, or its overall performance. It seems that we would be better serve by a check which actually verifies the number of conditional jump involved in the inner loop, rather than using const-folding as a proxy for performance.

scottmcm · 2019-01-09T00:41:58Z

Cross-reference: #56563

matthieu-m · 2019-01-12T13:52:53Z

Performance discussion should be accompanied by benchmarks, so I put together a number of benchmarks and used criterion to evaluate the relative performance of:

exclusive ranges: 0..(n+1).
chain: (0..n).chain(::std::iter::once(n)).
inclusive ranges: 0..=n.
this PR inclusive ranges: inclusive(0, n).

And the results are the following:

Benchmark	Exclusive	Chain	Inclusive	This PR
Sum	1.1429 ns	4.7870 ns	0.971,05 ns	787,530 ns
Triangle Foreach	1.1357 ns	1.5852 ns	0.980,40 ns	806,090 ns
Triangle Loop	1.1351 ns	1,164,100 ns	0.977,86 ns	835,870 ns
Add Mul Foreach	1.1721 ms	1.1737 ms	1.1747 ms	1.1722 ms
Add Mul Loop	1.2204 ms	1.1685 ms	1.1731 ms	1.1704 ms
Pythagorean Triples	972.10 us	982.73 us	1,859.5 us	1,232.8 us

(see gist for details of each benchmark, I reported only the black-hole cases: https://gist.github.com/matthieu-m/df8dcfed3e23ca83ea5abf9e7b3ca4d3)

This yields two conclusions:

The Rust code in this PR yields on par, or better, assembly in "complex" cases:
- Add Mul's body: count = (count + 3) * j;
- Pythagorean triples
However, it is opaque to LLVM's closed formula transformation, preventing eliding the loop altogether in "simple" cases:
- Straightforward sum.
- Triangle-like sum.

Also, it is notable that LLVM's closed formula transformation kicks in for for_each but not for an explicit loop when using a chained iterator to do the triangle sum by hand. This is indicative of a "hole" in the transformation, which I have yet to understand.

I guess either understanding or fixing this hole is the key to getting an implementation of inclusive ranges which both yields good assembly and let LLVM perform the closed formula transformation.

In the absence of such understanding/fixing, I would tend to prefer better straightforward assembly at the expense of the closed formula transformation: it is easier for the user to substitute a closed formula rather than re-implement an inclusive range, and I am doubtful that a closed formula exists in many cases.

I also have to revise my statement about performance; while on the Add Mul example, inclusive ranges perform as good as exclusive one, there is still some overhead remaining in the Pythagorean Triples case. It may simply be the slight overhead of the inner loop magnified by the number of times it is executed, of course, and this PR still significantly improves performance: from x1.91 to x1.27 slow-down.

Does anyone have any idea as to what could prevent LLVM from effecting the closed formula transformation?

matthieu-m · 2019-01-12T15:15:26Z

@kennytm As the author of the current version of RangeInclusive, do you remember if you had to do anything special to get LLVM to apply the closed formula transformation, which I believe to be the key to constant-folding?

kennytm · 2019-01-12T16:32:10Z

@matthieu-m I haven't investigated what causes LLVM to const-fold a loop, it just happened that the test works after tweaking the representation and putting #[inline] as many places as possible 😓.

matthieu-m · 2019-01-13T15:49:48Z

Updated performance number after specializing try_fold to make interior iteration more efficient¹² .

Benchmark	Exclusive	Chain	Inclusive	This PR
Sum	1.1385 ns	4.8235 ns	0.99411 ns	1.5881 ns
Triangle Foreach	1.1373 ns	1.5840 ns	0.98306 ns	1.5955 ns
Triangle Loop	1.1296 ns	1,167,000 ns	0.96986 ns	838,240 ns
Add Mul Foreach	1.1620 ms	1.2057 ms	1.1737 ms	1.1665 ms
Add Mul Loop	1.1660 ms	1.1693 ms	1.1693 ms	1.2504 ms
Pythagorean Triples	875.04 us	907.83 us	1,844.3 us	895.42 us

This reinforces the conclusion that the RangeInclusive code in this PR yields much better code for "complex" loops; notably, it finally catches up to the exclusive Range in the Pythagorean Triples example.

A custom try_fold now also allows it to catch up to Range when using internal iteration in "simple" loops, as we can seen using sum or for_each.

Unfortunately, it does nothing to improve the performance of "simple" loops using external iteration, where LLVM just fails to perform Loop Splitting and subsequently to transform the loop into a closed form.

My experiments with Loop Splitting have found it extremely finicky, with very similar cases falling on either side of the divide. This is pretty frustrating 😢

¹ The performance penalty observed is specific to the absence of Loop Splitting by LLVM; however in interior iteration we can manually split the loop between the loop itself and either a header or trailer, thereby gaining all our due performance without relying on getting lucky during optimizations.

² As a more general note, it also means that (a) it is likely beneficial to implement a specialized try_fold on many of the current iterators, such as Chain, if not done already and (b) it is likely beneficial to use internal iteration over external iteration in current iterators, for example in Filter::count which doesn't.

rust-highfive · 2019-01-13T16:47:09Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:065440f8:start=1547393674353503519,finish=1547393753035679132,duration=78682175613
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:10:57] 
[01:10:57] running 122 tests
[01:11:01] i..ii...iii..iiii.....i..............i..i......F.........i.....i......ii...i..i.ii..............i... 100/122
[01:11:01] failures:
[01:11:01] 
[01:11:01] ---- [codegen] codegen/issue-45222.rs stdout ----
[01:11:01] 
[01:11:01] 
[01:11:01] error: verification with 'FileCheck' failed
[01:11:01] status: exit code: 1
[01:11:01] command: "/usr/lib/llvm-6.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll" "/checkout/src/test/codegen/issue-45222.rs"
[01:11:01] ------------------------------------------
[01:11:01] 
[01:11:01] ------------------------------------------
[01:11:01] stderr:
[01:11:01] stderr:
[01:11:01] ------------------------------------------
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:23:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 500005000000000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:10:23: note: scanning from here
[01:11:01] define i64 @check_foo2() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                       ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:64:23: note: possible intended match here
[01:11:01]  %exitcond.i.1 = icmp eq i64 %7, 100000
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 5000050000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:76:31: note: scanning from here
[01:11:01] define i64 @check_triangle_inc() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                               ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:105:2: note: possible intended match here
[01:11:01]  ret i64 %count.0.i
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:61:12: error: expected string not found in input
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:61:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 500050000000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:109:24: note: scanning from here
[01:11:01] define i64 @check_foo3r() unnamed_addr #1 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                        ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:163:27: note: possible intended match here
[01:11:01]  %exitcond.i.i.i.1 = icmp eq i64 %14, 10000
[01:11:01] 
[01:11:01] ------------------------------------------
[01:11:01] 
[01:11:01] thread '[codegen] codegen/issue-45222.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3245:9
---
[01:11:01] 
[01:11:01] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:495:22
[01:11:01] 
[01:11:01] 
[01:11:01] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:11:01] 
[01:11:01] 
[01:11:01] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:11:01] Build completed unsuccessfully in 0:11:57
[01:11:01] Build completed unsuccessfully in 0:11:57
[01:11:01] Makefile:48: recipe for target 'check' failed
[01:11:01] make: *** [check] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0f86a6d7
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 13 16:47:04 UTC 2019

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

rust-highfive · 2019-01-13T18:11:58Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:265246dc:start=1547398870285545705,finish=1547398940346930657,duration=70061384952
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:09:20] 
[01:09:20] running 122 tests
[01:09:23] i..ii...iii..iiii.....i..............i..i......F.........i.....i......ii...i..i.ii..............i... 100/122
[01:09:24] failures:
[01:09:24] 
[01:09:24] ---- [codegen] codegen/issue-45222.rs stdout ----
[01:09:24] 
[01:09:24] 
[01:09:24] error: verification with 'FileCheck' failed
[01:09:24] status: exit code: 1
[01:09:24] command: "/usr/lib/llvm-6.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll" "/checkout/src/test/codegen/issue-45222.rs"
[01:09:24] ------------------------------------------
[01:09:24] 
[01:09:24] ------------------------------------------
[01:09:24] stderr:
[01:09:24] stderr:
[01:09:24] ------------------------------------------
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:23:12: error: expected string not found in input
[01:09:24]  // CHECK: ret i64 500005000000000
[01:09:24]            ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:10:23: note: scanning from here
[01:09:24] define i64 @check_foo2() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:09:24]                       ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:64:23: note: possible intended match here
[01:09:24]  %exitcond.i.1 = icmp eq i64 %7, 100000
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:09:24]  // CHECK: ret i64 5000050000
[01:09:24]            ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:76:31: note: scanning from here
[01:09:24] define i64 @check_triangle_inc() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:09:24]                               ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:105:2: note: possible intended match here
[01:09:24]  ret i64 %count.0.i
[01:09:24] 
[01:09:24] ------------------------------------------
[01:09:24] 
[01:09:24] thread '[codegen] codegen/issue-45222.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3245:9
---
[01:09:24] 
[01:09:24] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:495:22
[01:09:24] 
[01:09:24] 
[01:09:24] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:09:24] 
[01:09:24] 
[01:09:24] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:09:24] Build completed unsuccessfully in 0:11:13
[01:09:24] Build completed unsuccessfully in 0:11:13
[01:09:24] Makefile:48: recipe for target 'check' failed
[01:09:24] make: *** [check] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:11ced63a
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 13 18:11:53 UTC 2019
---
travis_time:end:06760162:start=1547403114594602927,finish=1547403114599052845,duration=4449918
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0234ee99
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|')

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

scottmcm · 2019-01-21T05:21:17Z

(a) it is likely beneficial to implement a specialized try_fold on many of the current iterators, such as Chain, if not done already

Many were -- including Chain, Filter, FilterMap, Enumerate, Scan, Fuse, and more -- in #45595 that added try_fold. There are probably still more, though, like VecDeque's.

hanna-kruppe · 2019-01-22T15:47:26Z

Sorry, it doesn't seem like I'll be able to give this PR proper attention in the near future. Please assign someone else.

scottmcm · 2019-01-22T21:19:59Z

r? @Kimundi

(For consistency with #56563 in the same area.)

Specialize Iterator::try_fold and DoubleEndedIterator::try_rfold to improve code generation in all internal iteration scenarios. This changes brings the performance of internal iteration with RangeInclusive on par with the performance of iteration with Range: - Single conditional jump in hot loop, - Unrolling and vectorization, - And even Closed Form substitution. Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of next and next_back, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between: - The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail. - An implementation using a "is_done" boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails. In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way to pick one implementation over the other, and thus I defer to the statu quo as far as next and next_back are concerned.

matthieu-m · 2019-02-03T16:21:52Z

Unfortunately, I have yet to find a way to get LLVM to play nice with external iteration.

I'll open another PR to improve internal iteration; as force-push corrupted this one, it seems.

rust-highfive assigned hanna-kruppe Jan 6, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 6, 2019

scottmcm mentioned this pull request Jan 9, 2019

Override <RangeInclusive as Iterator>::try_(r)fold #56563

Closed

rust-highfive assigned Kimundi and unassigned hanna-kruppe Jan 22, 2019

matthieu-m force-pushed the range_incl_perf branch from 80aa9e4 to eb5b096 Compare February 3, 2019 16:17

matthieu-m closed this Feb 3, 2019

matthieu-m mentioned this pull request Feb 3, 2019

RangeInclusive internal iteration performance improvement. #58122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeInclusive iteration performance improvement. #57378

RangeInclusive iteration performance improvement. #57378

matthieu-m commented Jan 6, 2019 •

edited

Loading

rust-highfive commented Jan 6, 2019

rust-highfive commented Jan 6, 2019

matthieu-m commented Jan 6, 2019 •

edited

Loading

ranma42 commented Jan 6, 2019

matthieu-m commented Jan 8, 2019

scottmcm commented Jan 9, 2019

matthieu-m commented Jan 12, 2019 •

edited

Loading

matthieu-m commented Jan 12, 2019

kennytm commented Jan 12, 2019

matthieu-m commented Jan 13, 2019 •

edited

Loading

rust-highfive commented Jan 13, 2019

rust-highfive commented Jan 13, 2019

scottmcm commented Jan 21, 2019

hanna-kruppe commented Jan 22, 2019

scottmcm commented Jan 22, 2019

matthieu-m commented Feb 3, 2019

RangeInclusive iteration performance improvement. #57378

RangeInclusive iteration performance improvement. #57378

Conversation

matthieu-m commented Jan 6, 2019 • edited Loading

rust-highfive commented Jan 6, 2019

rust-highfive commented Jan 6, 2019

matthieu-m commented Jan 6, 2019 • edited Loading

ranma42 commented Jan 6, 2019

matthieu-m commented Jan 8, 2019

scottmcm commented Jan 9, 2019

matthieu-m commented Jan 12, 2019 • edited Loading

matthieu-m commented Jan 12, 2019

kennytm commented Jan 12, 2019

matthieu-m commented Jan 13, 2019 • edited Loading

rust-highfive commented Jan 13, 2019

rust-highfive commented Jan 13, 2019

scottmcm commented Jan 21, 2019

hanna-kruppe commented Jan 22, 2019

scottmcm commented Jan 22, 2019

matthieu-m commented Feb 3, 2019

matthieu-m commented Jan 6, 2019 •

edited

Loading

matthieu-m commented Jan 6, 2019 •

edited

Loading

matthieu-m commented Jan 12, 2019 •

edited

Loading

matthieu-m commented Jan 13, 2019 •

edited

Loading