Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve autovectorization of to_lowercase / to_uppercase functions #123778

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jhorstmann
Copy link
Contributor

Refactor the code in the convert_while_ascii helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.

The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.

Fixes #123712

@rustbot
Copy link
Collaborator

rustbot commented Apr 11, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 11, 2024
@jhorstmann
Copy link
Contributor Author

r? @the8472

The assembly for x86 and aarch64 can also be seen at https://rust.godbolt.org/z/x6T65nE8E

@Marcondiro
Copy link
Contributor

Hi @jhorstmann do you have any benchmark for this? Thx!

@jhorstmann
Copy link
Contributor Author

@Marcondiro only the microbenchmark included in this PR. On my machine (Intel i9-11900KB) the performance increases by nearly 3x. This is without any target-specific compiler flags, rerunning them now with:

./x.py bench library/alloc/ --stage 0 --test-args to_lowercase

Before

benchmarks:
    string::bench_to_lowercase 57.00ns/iter +/- 1.00ns

After

benchmarks:
    string::bench_to_lowercase 20.00ns/iter +/- 1.00ns

@Marcondiro

This comment was marked as outdated.

@jhorstmann
Copy link
Contributor Author

Thanks for running the benchmarks, glad that there is no regression on arm. The improvement on x86 mostly comes from the usage of the pmovmskb instruction, the equivalent on arm requires multiple instructions with higher latency/lower throughput.

@Marcondiro

This comment was marked as outdated.

@jhorstmann jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from f20f9e6 to 2a52a46 Compare May 9, 2024 12:49
@jhorstmann
Copy link
Contributor Author

Thanks again, I was able to reproduce this on an aws c7g instance. It seems there were some bounds check remaining in the generated assembly, which were removed by the compiler on x86_64 and also in the simplified version I checked in compiler explorer. Adding some assume fixes that and the benchmarks on c7g / Neoverse-V1 are now

Before

benchmarks:
    string::bench_to_lowercase 61.00ns/iter +/- 0.00ns

After

benchmarks:
    string::bench_to_lowercase 28.00ns/iter +/- 0.00ns

@bors
Copy link
Contributor

bors commented May 9, 2024

☔ The latest upstream changes (presumably #124773) made this pull request unmergeable. Please resolve the merge conflicts.

@jhorstmann jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from 2a52a46 to 6c58e74 Compare May 10, 2024 19:55
@jhorstmann
Copy link
Contributor Author

@the8472 can you take a look at this PR? It might have been lost in the review queue.

@Marcondiro
Copy link
Contributor

A couple of thoughts:

  • A similar thing is done also here:
    /// Optimized ASCII test that will use usize-at-a-time operations instead of
    /// byte-at-a-time operations (when possible).
    ///
    /// The algorithm we use here is pretty simple. If `s` is too short, we just
    /// check each byte and be done with it. Otherwise:
    ///
    /// - Read the first word with an unaligned load.
    /// - Align the pointer, read subsequent words until end with aligned loads.
    /// - Read the last `usize` from `s` with an unaligned load.
    ///
    /// If any of these loads produces something for which `contains_nonascii`
    /// (above) returns true, then we know the answer is false.
    #[inline]
    const fn is_ascii(s: &[u8]) -> bool {
    maybe would be nice to somehow merge the two?
  • Would be interesting to understand why the compiler doesn't use pmovmskb if not writing the code in this specific way (eg. using & 0x8080808080808080)

library/alloc/src/str.rs Outdated Show resolved Hide resolved
library/alloc/src/str.rs Show resolved Hide resolved
library/alloc/src/str.rs Outdated Show resolved Hide resolved
library/alloc/src/str.rs Outdated Show resolved Hide resolved
library/alloc/src/str.rs Outdated Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2024
@jhorstmann jhorstmann force-pushed the optimize-upper-lower-auto-vectorization branch from 6c58e74 to b03d939 Compare June 2, 2024 20:49
@jhorstmann
Copy link
Contributor Author

jhorstmann commented Jun 2, 2024

Updated benchmark results:

master (#eda9d7f987de76b9d61c633a6ac328936e1b94f0)
benchmarks:
    str::to_lowercase::long_lorem_ipsum  534.48/iter +/- 33.86
    str::to_lowercase::short_ascii        20.11/iter  +/- 0.53
    str::to_lowercase::short_mixed       240.21/iter  +/- 3.40
    str::to_lowercase::short_pile_of_poo 262.54/iter  +/- 8.33

PR (#b03d93962816fd82afb619e0cf2083dc67e218e8)
benchmarks:
    str::to_lowercase::long_lorem_ipsum  148.49/iter +/- 1.24
    str::to_lowercase::short_ascii        12.14/iter +/- 0.13
    str::to_lowercase::short_mixed       240.31/iter +/- 3.51
    str::to_lowercase::short_pile_of_poo 259.42/iter +/- 7.45

Benchmark results in the initial PR comments were using a different input, these are using the same input strings as several existing benchmarks. I did not get around yet to rerunning on aarch64.

Update: Above results were with -Ctarget-cpu=native, with default target the improvements are bigger:

master
    str::to_lowercase::long_lorem_ipsum  1027.63/iter +/- 39.76
    str::to_lowercase::short_ascii         34.39/iter  +/- 0.73
    str::to_lowercase::short_mixed        262.95/iter +/- 16.07
    str::to_lowercase::short_pile_of_poo  261.71/iter +/- 15.26
PR
    str::to_lowercase::long_lorem_ipsum  175.39/iter  +/- 3.18
    str::to_lowercase::short_ascii        12.25/iter  +/- 0.30
    str::to_lowercase::short_mixed       237.15/iter +/- 11.36
    str::to_lowercase::short_pile_of_poo 262.57/iter  +/- 8.37

library/alloc/src/str.rs Outdated Show resolved Hide resolved
@Marcondiro
Copy link
Contributor

Benchmark results on aarch64 (Apple M1)

master:
    str::to_lowercase::long_lorem_ipsum  523.29/iter  +/- 3.10
    str::to_lowercase::short_ascii        33.31/iter  +/- 0.29
    str::to_lowercase::short_mixed       299.64/iter +/- 23.58
    str::to_lowercase::short_pile_of_poo 295.03/iter  +/- 9.97
PR:
    str::to_lowercase::long_lorem_ipsum  129.03/iter  +/- 2.37
    str::to_lowercase::short_ascii        23.27/iter  +/- 0.49
    str::to_lowercase::short_mixed       271.17/iter +/- 31.33
    str::to_lowercase::short_pile_of_poo 272.03/iter  +/- 3.60

Great results even without pmovmskb 👍

@the8472
Copy link
Member

the8472 commented Jun 25, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Sep 18, 2024

📌 Commit 58b23cb has been approved by the8472

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2024
@jhorstmann
Copy link
Contributor Author

@the8472 thank you for the patience. I updated the PR and also squashed the previous commits. I could not get the codegen tests to work with test revisions, since it requires the std or at least core library. All other codegen tests with revisions that I looked at are actually using no_core, probably for that reason. Instead it is now only run for x86-64, I think this should be ok since the autovectorization on llvm ir level should be mostly backend independent, assuming the backend has some form of simd support.

Another change is that I removed the code duplication in the codegen test and instead made convert_while_ascii public under the str_internals feature. There is precedent for this in #111222 for the is_ascii_simple function.

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2024
…-vectorization, r=the8472

Improve autovectorization of to_lowercase / to_uppercase functions

Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.

The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.

Fixes rust-lang#123712
@bors
Copy link
Contributor

bors commented Sep 19, 2024

⌛ Testing commit 58b23cb with merge 2136b65...

@bors
Copy link
Contributor

bors commented Sep 19, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 19, 2024
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented Sep 19, 2024

looks like a network error

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2024
…-vectorization, r=the8472

Improve autovectorization of to_lowercase / to_uppercase functions

Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.

The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.

Fixes rust-lang#123712
@bors
Copy link
Contributor

bors commented Sep 19, 2024

⌛ Testing commit 58b23cb with merge 2c75a64...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Sep 19, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 19, 2024
@the8472
Copy link
Member

the8472 commented Sep 19, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Sep 19, 2024

📌 Commit 60a13dd has been approved by the8472

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 19, 2024
…-vectorization, r=the8472

Improve autovectorization of to_lowercase / to_uppercase functions

Refactor the code in the `convert_while_ascii` helper function to make it more suitable for auto-vectorization and also process the full ascii prefix of the string. The generic case conversion logic will only be invoked starting from the first non-ascii character.

The runtime on a microbenchmark with a small ascii-only input decreases from ~55ns to ~18ns per iteration. The new implementation also reduces the amount of unsafe code and encapsulates all unsafe inside the helper function.

Fixes rust-lang#123712
@bors
Copy link
Contributor

bors commented Sep 19, 2024

⌛ Testing commit 60a13dd with merge dac43b914138744e2d158dbf140385a5ffd62638...

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-19 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
failures:

---- [codegen] tests/codegen/issues/issue-111508-vec-tryinto-array.rs stdout ----

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/usr/lib/llvm-19/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll" "/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs" "--check-prefix=CHECK" "--check-prefix" "NONMSVC" "--allow-unused-prefixes" "--dump-input-context" "100"
--- stderr -------------------------------
/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs:12:15: error: CHECK-NOT: excluded string found in input
/checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs:12:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: unwrap_failed
              ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll:162:24: note: found here
; invoke core::result::unwrap_failed

Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-111508-vec-tryinto-array/issue-111508-vec-tryinto-array.ll
Check file: /checkout/tests/codegen/issues/issue-111508-vec-tryinto-array.rs


-dump-input=help explains the following input dump.
Input was:
<<<<<<
        .
        .
        .
        .
       62:  tail call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3, i64 noundef %_5.i.i.i.i1, i64 noundef 1) #8, !noalias !24 
       63:  br label %"_ZN4core3ptr53drop_in_place$LT$alloc..raw_vec..RawVec$LT$u8$GT$$GT$17h6c178d1dea818e7dE.exit4" 
       64:  
       65: "_ZN4core3ptr53drop_in_place$LT$alloc..raw_vec..RawVec$LT$u8$GT$$GT$17h6c178d1dea818e7dE.exit4": ; preds = %bb4, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2" 
       66:  ret void 
       67: } 
       68:  
       69: ; <alloc::vec::Vec<T,A> as core::fmt::Debug>::fmt 
       70: ; Function Attrs: nonlazybind uwtable 
       71: define internal noundef zeroext i1 @"_ZN65_$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..fmt..Debug$GT$3fmt17h071a08ce47b992ffE"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %self, ptr noalias noundef align 8 dereferenceable(64) %f) unnamed_addr #0 personality ptr @rust_eh_personality { 
       72: start: 
       73:  %entry.i.i = alloca [8 x i8], align 8 
       74:  %_5.i = alloca [16 x i8], align 8 
       75:  %self1 = load ptr, ptr %self, align 8, !nonnull !3, !noundef !3 
       76:  %0 = getelementptr inbounds i8, ptr %self, i64 16 
       77:  %len = load i64, ptr %0, align 8, !noundef !3 
       78:  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_5.i), !noalias !25 
       79: ; call core::fmt::Formatter::debug_list 
       80:  call void @_ZN4core3fmt9Formatter10debug_list17h4f2f427b0842a3ebE(ptr noalias nocapture noundef nonnull sret([16 x i8]) align 8 dereferenceable(16) %_5.i, ptr noalias noundef nonnull align 8 dereferenceable(64) %f), !noalias !29 
       81:  %_11.i = getelementptr inbounds i8, ptr %self1, i64 %len 
       82:  %1 = icmp eq i64 %len, 0 
       83:  br i1 %1, label %"_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit", label %bb5.i.i 
       84:  
       85: bb5.i.i: ; preds = %start, %bb5.i.i 
       86:  %iter.sroa.4.06.i.i = phi ptr [ %_24.i.i.i, %bb5.i.i ], [ %self1, %start ] 
       87:  %_24.i.i.i = getelementptr inbounds i8, ptr %iter.sroa.4.06.i.i, i64 1 
       88:  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %entry.i.i), !noalias !30 
       89:  store ptr %iter.sroa.4.06.i.i, ptr %entry.i.i, align 8, !noalias !30 
       90: ; call core::fmt::builders::DebugList::entry 
       91:  %_9.i.i = call noundef align 8 dereferenceable(16) ptr @_ZN4core3fmt8builders9DebugList5entry17h0e93e15c1edda619E(ptr noalias noundef nonnull align 8 dereferenceable(16) %_5.i, ptr noundef nonnull align 1 %entry.i.i, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @vtable.0) 
       92:  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %entry.i.i), !noalias !30 
       93:  %2 = icmp eq ptr %_24.i.i.i, %_11.i 
       94:  br i1 %2, label %"_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit", label %bb5.i.i 
       95:  
       96: "_ZN48_$LT$$u5b$T$u5d$$u20$as$u20$core..fmt..Debug$GT$3fmt17hd710647428764915E.exit": ; preds = %bb5.i.i, %start 
       97: ; call core::fmt::builders::DebugList::finish 
       98:  %_0.i = call noundef zeroext i1 @_ZN4core3fmt8builders9DebugList6finish17hc2340632c9bfa6bfE(ptr noalias noundef nonnull align 8 dereferenceable(16) %_5.i) 
       99:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.i), !noalias !25 
      100:  ret i1 %_0.i 
      101: } 
      102:  
      103: ; Function Attrs: nonlazybind uwtable 
      104: define noundef i8 @example(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %a) unnamed_addr #0 personality ptr @rust_eh_personality { 
      105: start: 
      106:  %e.i = alloca [24 x i8], align 8 
      107:  %_5.sroa.5 = alloca [16 x i8], align 8 
      108:  %0 = getelementptr inbounds i8, ptr %a, i64 16 
      109:  %_2 = load i64, ptr %0, align 8, !noundef !3 
      110:  %1 = icmp eq i64 %_2, 32 
      111:  br i1 %1, label %bb2, label %bb1 
      112:  
      113: bb2: ; preds = %start 
      114:  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_5.sroa.5) 
      115:  %_5.sroa.0.0.copyload = load ptr, ptr %a, align 8 
      116:  %_5.sroa.5.0.a.sroa_idx = getelementptr inbounds i8, ptr %a, i64 8 
      117:  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %_5.sroa.5, ptr noundef nonnull align 8 dereferenceable(16) %_5.sroa.5.0.a.sroa_idx, i64 16, i1 false) 
      118:  tail call void @llvm.experimental.noalias.scope.decl(metadata !33) 
      119:  tail call void @llvm.experimental.noalias.scope.decl(metadata !36) 
      120:  %_5.sroa.5.8.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.5, i64 8 
      121:  %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i = load i64, ptr %_5.sroa.5.8.sroa_idx, align 8 
      122:  %_2.not.i = icmp eq i64 %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i, 32 
      123:  br i1 %_2.not.i, label %bb6.i, label %bb2.i 
      124:  
      125: bb6.i: ; preds = %bb2 
      126:  %2 = icmp ne ptr %_5.sroa.0.0.copyload, null 
      127:  tail call void @llvm.assume(i1 %2) 
      128:  %_4.sroa.9.1.self.i.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.0.0.copyload, i64 15 
      129:  %_4.sroa.9.1.copyload = load i8, ptr %_4.sroa.9.1.self.i.sroa_idx, align 1, !noalias !36 
      130:  %_4.sroa.11.1.self.i.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.0.0.copyload, i64 24 
      131:  %_4.sroa.11.1.copyload = load i8, ptr %_4.sroa.11.1.self.i.sroa_idx, align 1, !noalias !36 
      132:  tail call void @llvm.experimental.noalias.scope.decl(metadata !38) 
      133:  tail call void @llvm.experimental.noalias.scope.decl(metadata !41) 
      134:  tail call void @llvm.experimental.noalias.scope.decl(metadata !44) 
      135:  tail call void @llvm.experimental.noalias.scope.decl(metadata !47) 
      136:  %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i = load i64, ptr %_5.sroa.5, align 8, !alias.scope !50, !noalias !53 
      137:  %3 = icmp eq i64 %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i, 0 
      138:  br i1 %3, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit", label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i" 
      139:  
      140: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i": ; preds = %bb6.i 
      141:  tail call void @__rust_dealloc(ptr noundef nonnull %_5.sroa.0.0.copyload, i64 noundef %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._5.i.i.i.i1.i.i, i64 noundef 1) #8, !noalias !55 
      142:  br label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" 
      143:  
      144: bb2.i: ; preds = %bb2 
      145:  %4 = lshr i64 %_5.sroa.5.8._5.sroa.5.8._5.sroa.5.8._5.sroa.5.16._3.i, 8 
      146:  %5 = trunc i64 %4 to i8 
      147:  %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._4.sroa.9.8.copyload8 = load i8, ptr %_5.sroa.5, align 8, !alias.scope !56 
      148:  %_5.sroa.5.1.sroa_idx = getelementptr inbounds i8, ptr %_5.sroa.5, i64 1 
      149:  %_5.sroa.5.1._5.sroa.5.1._5.sroa.5.1._5.sroa.5.9._4.sroa.10.8.copyload9 = load i64, ptr %_5.sroa.5.1.sroa_idx, align 1, !alias.scope !56 
      150:  %6 = getelementptr inbounds i8, ptr %a, i64 18 
      151:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.sroa.5) 
      152:  call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %e.i), !noalias !57 
      153:  store ptr %_5.sroa.0.0.copyload, ptr %e.i, align 8, !noalias !61 
      154:  %_4.sroa.9.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 8 
      155:  store i8 %_5.sroa.5.0._5.sroa.5.0._5.sroa.5.0._5.sroa.5.8._4.sroa.9.8.copyload8, ptr %_4.sroa.9.8.e.i.sroa_idx, align 8, !noalias !61 
      156:  %_4.sroa.10.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 9 
      157:  store i64 %_5.sroa.5.1._5.sroa.5.1._5.sroa.5.1._5.sroa.5.9._4.sroa.10.8.copyload9, ptr %_4.sroa.10.8.e.i.sroa_idx, align 1, !noalias !61 
      158:  %_4.sroa.11.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 17 
      159:  store i8 %5, ptr %_4.sroa.11.8.e.i.sroa_idx, align 1, !noalias !61 
      160:  %_4.sroa.12.8.e.i.sroa_idx = getelementptr inbounds i8, ptr %e.i, i64 18 
      161:  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 2 dereferenceable(6) %_4.sroa.12.8.e.i.sroa_idx, ptr noundef nonnull align 2 dereferenceable(6) %6, i64 6, i1 false) 
      162: ; invoke core::result::unwrap_failed 
not:12                            !~~~~~~~~~~~~  error: no match expected
      163:  invoke void @_ZN4core6result13unwrap_failed17hda82ba412d85e1ccE(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @vtable.1, ptr noalias noundef nonnull readonly align 8 dereferenceable(32) @alloc_fdf1acc6022b418ce8fc31757b7c7478) #9 
      164:  to label %unreachable.i unwind label %cleanup.i, !noalias !57 
      165:  
      166: cleanup.i: ; preds = %bb2.i 
      167:  %7 = landingpad { ptr, i32 } 
      168:  cleanup 
      169:  call void @llvm.experimental.noalias.scope.decl(metadata !62) 
      170:  call void @llvm.experimental.noalias.scope.decl(metadata !65), !noalias !57 
      171:  call void @llvm.experimental.noalias.scope.decl(metadata !68), !noalias !57 
      172:  call void @llvm.experimental.noalias.scope.decl(metadata !71), !noalias !57 
      173:  %_5.i.i.i.i1.i = load i64, ptr %_4.sroa.9.8.e.i.sroa_idx, align 8, !alias.scope !74, !noalias !77 
      174:  %8 = icmp eq i64 %_5.i.i.i.i1.i, 0 
      175:  br i1 %8, label %bb5.i, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i" 
      176:  
      177: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i": ; preds = %cleanup.i 
      178:  %self4.i.i.i.i3.i = load ptr, ptr %e.i, align 8, !alias.scope !74, !noalias !77, !nonnull !3, !noundef !3 
      179:  call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3.i, i64 noundef %_5.i.i.i.i1.i, i64 noundef 1) #8, !noalias !79 
      180:  br label %bb5.i 
      181:  
      182: unreachable.i: ; preds = %bb2.i 
      183:  unreachable 
      184:  
      185: bb5.i: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i", %cleanup.i 
      186:  resume { ptr, i32 } %7 
      187:  
      188: "_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit": ; preds = %bb6.i, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i.i" 
      189:  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %_5.sroa.5) 
      190:  %9 = add i8 %_4.sroa.11.1.copyload, %_4.sroa.9.1.copyload 
      191:  br label %bb4 
      192:  
      193: bb1: ; preds = %start 
      194:  tail call void @llvm.experimental.noalias.scope.decl(metadata !80) 
      195:  tail call void @llvm.experimental.noalias.scope.decl(metadata !83) 
      196:  tail call void @llvm.experimental.noalias.scope.decl(metadata !86) 
      197:  tail call void @llvm.experimental.noalias.scope.decl(metadata !89) 
      198:  %10 = getelementptr inbounds i8, ptr %a, i64 8 
      199:  %_5.i.i.i.i1.i2 = load i64, ptr %10, align 8, !alias.scope !92, !noalias !95 
      200:  %11 = icmp eq i64 %_5.i.i.i.i1.i2, 0 
      201:  br i1 %11, label %bb4, label %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3" 
      202:  
      203: "_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3": ; preds = %bb1 
      204:  %self4.i.i.i.i3.i4 = load ptr, ptr %a, align 8, !alias.scope !92, !noalias !95, !nonnull !3, !noundef !3 
      205:  tail call void @__rust_dealloc(ptr noundef nonnull %self4.i.i.i.i3.i4, i64 noundef %_5.i.i.i.i1.i2, i64 noundef 1) #8, !noalias !97 
      206:  br label %bb4 
      207:  
      208: bb4: ; preds = %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3", %bb1, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" 
      209:  %_0.sroa.0.0 = phi i8 [ %9, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h443f41158dd09e33E.exit" ], [ 0, %bb1 ], [ 0, %"_ZN63_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$10deallocate17h2e446bd2f697df1bE.exit.i.i.i2.i3" ] 
      210:  ret i8 %_0.sroa.0.0 
      211: } 
      213: ; core::fmt::Formatter::debug_list 
      213: ; core::fmt::Formatter::debug_list 
      214: ; Function Attrs: nonlazybind uwtable 
      215: declare void @_ZN4core3fmt9Formatter10debug_list17h4f2f427b0842a3ebE(ptr dead_on_unwind noalias nocapture noundef writable sret([16 x i8]) align 8 dereferenceable(16), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      217: ; core::fmt::builders::DebugList::finish 
      217: ; core::fmt::builders::DebugList::finish 
      218: ; Function Attrs: nonlazybind uwtable 
      219: declare noundef zeroext i1 @_ZN4core3fmt8builders9DebugList6finish17hc2340632c9bfa6bfE(ptr noalias noundef align 8 dereferenceable(16)) unnamed_addr #0 
      220:  
      221: ; core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt 
      222: ; Function Attrs: nonlazybind uwtable 
      223: declare noundef zeroext i1 @"_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17h40283dbe3b45cc8aE"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      224:  
      225: ; core::fmt::num::<impl core::fmt::UpperHex for u8>::fmt 
      226: ; Function Attrs: nonlazybind uwtable 
      227: declare noundef zeroext i1 @"_ZN4core3fmt3num52_$LT$impl$u20$core..fmt..UpperHex$u20$for$u20$u8$GT$3fmt17hbb637f4ec75fe0b4E"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      228:  
      229: ; core::fmt::num::<impl core::fmt::LowerHex for u8>::fmt 
      230: ; Function Attrs: nonlazybind uwtable 
      231: declare noundef zeroext i1 @"_ZN4core3fmt3num52_$LT$impl$u20$core..fmt..LowerHex$u20$for$u20$u8$GT$3fmt17hbae9e1526842e35cE"(ptr noalias noundef readonly align 1 dereferenceable(1), ptr noalias noundef align 8 dereferenceable(64)) unnamed_addr #0 
      232:  
      233: ; Function Attrs: nounwind nonlazybind uwtable 
      234: declare noundef range(i32 0, 10) i32 @rust_eh_personality(i32 noundef, i32 noundef range(i32 1, 17), i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1 
      236: ; core::fmt::builders::DebugList::entry 
      236: ; core::fmt::builders::DebugList::entry 
      237: ; Function Attrs: nonlazybind uwtable 
      238: declare noundef align 8 dereferenceable(16) ptr @_ZN4core3fmt8builders9DebugList5entry17h0e93e15c1edda619E(ptr noalias noundef align 8 dereferenceable(16), ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(32)) unnamed_addr #0 
      239:  
      240: ; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: readwrite) 
      241: declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #2 
      243: ; core::result::unwrap_failed 
      243: ; core::result::unwrap_failed 
      244: ; Function Attrs: cold noinline noreturn nonlazybind uwtable 
      245: declare void @_ZN4core6result13unwrap_failed17hda82ba412d85e1ccE(ptr noalias noundef nonnull readonly align 1, i64 noundef, ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(32), ptr noalias noundef readonly align 8 dereferenceable(32)) unnamed_addr #3 
      246:  
      247: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) 
      248: declare void @llvm.assume(i1 noundef) #4 
      249:  
      250: ; Function Attrs: nounwind nonlazybind allockind("free") uwtable 
      251: declare void @__rust_dealloc(ptr allocptr noundef, i64 noundef, i64 noundef) unnamed_addr #5 
      252:  
      253: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) 
      254: declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #6 
      255:  
      256: ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) 
      257: declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #6 
      258:  
      259: ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) 
      260: declare void @llvm.experimental.noalias.scope.decl(metadata) #7 
      261:  
      262: attributes #0 = { nonlazybind uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" } 
        .
        .
>>>>>>
------------------------------------------

@bors
Copy link
Contributor

bors commented Sep 20, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String::to_lowercase does not get vectorized well contrary to code comments
8 participants