-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rustc_span][perf] Remove unnecessary string joins and allocs. #114351
Conversation
Comparing vectors of string parts yields the same result but avoids unnecessary `join` and potential allocation for resulting `String`. This code is cold so it's unlikely to have any measurable impact, but considering but since it's also simpler, why not? :)
r? @fee1-dead (rustbot has picked a reviewer for you, use r? to override) |
Do you know if the call site immediately above rust/compiler/rustc_span/src/edit_distance.rs Lines 240 to 250 in 7a5d2d0
sort_by_words once per iteration for the same lookup input or is it LICMed away ?
|
I assumed that this fold would be desugared into a loop and compiler wouldn't have much difficulties to hoist this computation out of the loop. But apparently I was wrong according to https://godbolt.org/z/frs8Kj1rq
2 calls to
Amusingly the total number of assembly lines is the same, but one of the versions clearly seems to be more performant :) In any case, since hosting is not related to current change, I'll send a separate PR once this one is merged. |
@lqd commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to https://godbolt.org/z/frs8Kj1rq it seems like I was wrong: original version seems to have 2 calls per loop iteration ``` .LBB16_3: mov rbx, qword ptr [r13] mov r14, qword ptr [r13 + 8] lea rdi, [rsp + 40] mov rsi, rbx mov rdx, r14 call example::sort_by_words lea rdi, [rsp + 64] mov rsi, qword ptr [rsp + 8] mov rdx, qword ptr [rsp + 16] call example::sort_by_words mov rdi, qword ptr [rsp + 40] mov rdx, qword ptr [rsp + 56] mov rsi, qword ptr [rsp + 64] cmp rdx, qword ptr [rsp + 80] mov qword ptr [rsp + 32], rdi mov qword ptr [rsp + 24], rsi jne .LBB16_5 call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete al mov dword ptr [rsp + 4], eax mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` but the manually hoisted version just 1: ``` .LBB16_3: mov r13, qword ptr [r15] mov r14, qword ptr [r15 + 8] lea rdi, [rsp + 64] mov rsi, r13 mov rdx, r14 call example::sort_by_words mov rdi, qword ptr [rsp + 64] mov rdx, qword ptr [rsp + 16] cmp qword ptr [rsp + 80], rdx mov qword ptr [rsp + 32], rdi jne .LBB16_5 mov rsi, qword ptr [rsp + 8] call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete bpl mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
[rustc_span][perf] Hoist lookup sorted by words out of the loop. `@lqd` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to https://godbolt.org/z/frs8Kj1rq it seems like I was wrong: original version seems to have 2 calls per loop iteration ``` .LBB16_3: mov rbx, qword ptr [r13] mov r14, qword ptr [r13 + 8] lea rdi, [rsp + 40] mov rsi, rbx mov rdx, r14 call example::sort_by_words lea rdi, [rsp + 64] mov rsi, qword ptr [rsp + 8] mov rdx, qword ptr [rsp + 16] call example::sort_by_words mov rdi, qword ptr [rsp + 40] mov rdx, qword ptr [rsp + 56] mov rsi, qword ptr [rsp + 64] cmp rdx, qword ptr [rsp + 80] mov qword ptr [rsp + 32], rdi mov qword ptr [rsp + 24], rsi jne .LBB16_5 call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete al mov dword ptr [rsp + 4], eax mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` but the manually hoisted version just 1: ``` .LBB16_3: mov r13, qword ptr [r15] mov r14, qword ptr [r15 + 8] lea rdi, [rsp + 64] mov rsi, r13 mov rdx, r14 call example::sort_by_words mov rdi, qword ptr [rsp + 64] mov rdx, qword ptr [rsp + 16] cmp qword ptr [rsp + 80], rdx mov qword ptr [rsp + 32], rdi jne .LBB16_5 mov rsi, qword ptr [rsp + 8] call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete bpl mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
[rustc_span][perf] Hoist lookup sorted by words out of the loop. ``@lqd`` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to https://godbolt.org/z/frs8Kj1rq it seems like I was wrong: original version seems to have 2 calls per loop iteration ``` .LBB16_3: mov rbx, qword ptr [r13] mov r14, qword ptr [r13 + 8] lea rdi, [rsp + 40] mov rsi, rbx mov rdx, r14 call example::sort_by_words lea rdi, [rsp + 64] mov rsi, qword ptr [rsp + 8] mov rdx, qword ptr [rsp + 16] call example::sort_by_words mov rdi, qword ptr [rsp + 40] mov rdx, qword ptr [rsp + 56] mov rsi, qword ptr [rsp + 64] cmp rdx, qword ptr [rsp + 80] mov qword ptr [rsp + 32], rdi mov qword ptr [rsp + 24], rsi jne .LBB16_5 call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete al mov dword ptr [rsp + 4], eax mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` but the manually hoisted version just 1: ``` .LBB16_3: mov r13, qword ptr [r15] mov r14, qword ptr [r15 + 8] lea rdi, [rsp + 64] mov rsi, r13 mov rdx, r14 call example::sort_by_words mov rdi, qword ptr [rsp + 64] mov rdx, qword ptr [rsp + 16] cmp qword ptr [rsp + 80], rdx mov qword ptr [rsp + 32], rdi jne .LBB16_5 mov rsi, qword ptr [rsp + 8] call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete bpl mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
[rustc_span][perf] Hoist lookup sorted by words out of the loop. ```@lqd``` commented on rust-lang#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to https://godbolt.org/z/frs8Kj1rq it seems like I was wrong: original version seems to have 2 calls per loop iteration ``` .LBB16_3: mov rbx, qword ptr [r13] mov r14, qword ptr [r13 + 8] lea rdi, [rsp + 40] mov rsi, rbx mov rdx, r14 call example::sort_by_words lea rdi, [rsp + 64] mov rsi, qword ptr [rsp + 8] mov rdx, qword ptr [rsp + 16] call example::sort_by_words mov rdi, qword ptr [rsp + 40] mov rdx, qword ptr [rsp + 56] mov rsi, qword ptr [rsp + 64] cmp rdx, qword ptr [rsp + 80] mov qword ptr [rsp + 32], rdi mov qword ptr [rsp + 24], rsi jne .LBB16_5 call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete al mov dword ptr [rsp + 4], eax mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` but the manually hoisted version just 1: ``` .LBB16_3: mov r13, qword ptr [r15] mov r14, qword ptr [r15 + 8] lea rdi, [rsp + 64] mov rsi, r13 mov rdx, r14 call example::sort_by_words mov rdi, qword ptr [rsp + 64] mov rdx, qword ptr [rsp + 16] cmp qword ptr [rsp + 80], rdx mov qword ptr [rsp + 32], rdi jne .LBB16_5 mov rsi, qword ptr [rsp + 8] call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete bpl mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
[rustc_span][perf] Hoist lookup sorted by words out of the loop. ```@lqd``` commented on rust-lang/rust#114351 asking if `sort_by_words(lookup)` is computed repeatedly. I was assuming that rustc should have no difficulties to hoist it automatically outside of the loop to avoid repeated pure computation, but according to https://godbolt.org/z/frs8Kj1rq it seems like I was wrong: original version seems to have 2 calls per loop iteration ``` .LBB16_3: mov rbx, qword ptr [r13] mov r14, qword ptr [r13 + 8] lea rdi, [rsp + 40] mov rsi, rbx mov rdx, r14 call example::sort_by_words lea rdi, [rsp + 64] mov rsi, qword ptr [rsp + 8] mov rdx, qword ptr [rsp + 16] call example::sort_by_words mov rdi, qword ptr [rsp + 40] mov rdx, qword ptr [rsp + 56] mov rsi, qword ptr [rsp + 64] cmp rdx, qword ptr [rsp + 80] mov qword ptr [rsp + 32], rdi mov qword ptr [rsp + 24], rsi jne .LBB16_5 call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete al mov dword ptr [rsp + 4], eax mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` but the manually hoisted version just 1: ``` .LBB16_3: mov r13, qword ptr [r15] mov r14, qword ptr [r15 + 8] lea rdi, [rsp + 64] mov rsi, r13 mov rdx, r14 call example::sort_by_words mov rdi, qword ptr [rsp + 64] mov rdx, qword ptr [rsp + 16] cmp qword ptr [rsp + 80], rdx mov qword ptr [rsp + 32], rdi jne .LBB16_5 mov rsi, qword ptr [rsp + 8] call qword ptr [rip + bcmp@GOTPCREL] test eax, eax sete bpl mov rsi, qword ptr [rsp + 72] test rsi, rsi jne .LBB16_8 jmp .LBB16_9 ``` This code is probably not very hot, but there is no reason to leave such a low hanging fruit.
@bors r+ rollup |
[rustc_span][perf] Remove unnecessary string joins and allocs. Comparing vectors of string parts yields the same result but avoids unnecessary `join` and potential allocation for resulting `String`. This code is cold so it's unlikely to have any measurable impact, but considering but since it's also simpler, why not? :)
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113945 (Fix wrong span for trait selection failure error reporting) - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.) - rust-lang#114418 (bump parking_lot to 0.12) - rust-lang#114434 (Improve spans for indexing expressions) - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError) - rust-lang#114461 (Fix unwrap on None) - rust-lang#114462 (interpret: add mplace_to_ref helper method) - rust-lang#114472 (Reword `confusable_idents` lint) - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone) r? `@ghost` `@rustbot` modify labels: rollup
btw, just for fun, I've also wrote a benchmark and got
although the numbers may be different for different strings. |
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113945 (Fix wrong span for trait selection failure error reporting) - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.) - rust-lang#114418 (bump parking_lot to 0.12) - rust-lang#114434 (Improve spans for indexing expressions) - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError) - rust-lang#114461 (Fix unwrap on None) - rust-lang#114462 (interpret: add mplace_to_ref helper method) - rust-lang#114472 (Reword `confusable_idents` lint) - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113945 (Fix wrong span for trait selection failure error reporting) - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.) - rust-lang#114418 (bump parking_lot to 0.12) - rust-lang#114434 (Improve spans for indexing expressions) - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError) - rust-lang#114461 (Fix unwrap on None) - rust-lang#114462 (interpret: add mplace_to_ref helper method) - rust-lang#114472 (Reword `confusable_idents` lint) - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#113945 (Fix wrong span for trait selection failure error reporting) - rust-lang#114351 ([rustc_span][perf] Remove unnecessary string joins and allocs.) - rust-lang#114418 (bump parking_lot to 0.12) - rust-lang#114434 (Improve spans for indexing expressions) - rust-lang#114450 (Fix ICE failed to get layout for ReferencesError) - rust-lang#114461 (Fix unwrap on None) - rust-lang#114462 (interpret: add mplace_to_ref helper method) - rust-lang#114472 (Reword `confusable_idents` lint) - rust-lang#114477 (Account for `Rc` and `Arc` when suggesting to clone) r? `@ghost` `@rustbot` modify labels: rollup
Comparing vectors of string parts yields the same result but avoids unnecessary
join
and potential allocation for resultingString
. This code is cold so it's unlikely to have any measurable impact, but considering but since it's also simpler, why not? :)