-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor memchr to allow optimization #76971
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
I'm not the right reviewer for this. Someone from libs should review. |
r? @dtolnay if I may ask for a review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@bors r+ |
📌 Commit 37f08c7 has been approved by |
... I don't see how this test even works since Also the test is passing because it is calling @bors r- |
That's fair, although I remember the test fail without the patch. o.O |
I just checked it locally: your test passes without the patch. |
@Amanieu and fails when I check for slice_contains. I'm looking into it, thanks a lot for noticing! |
adbda0d
to
ac0b79d
Compare
I've figured out what the basis of my assumption was: https://godbolt.org/z/sozbsr I've tried the refactor first on godbolt.org and it produced the expected assembly. I guess the difference comes from the code being in the same crate as the called, so it's inlined without explicit #[inline] hints. |
library/core/src/slice/memchr.rs
Outdated
memchr_general_case(x, text) | ||
} | ||
|
||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't this inhibit const folding by llvm in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in my godbolt-example, as far as I can tell, after 16 elements the whole memchr code would be inlined instead of folding. So technically, yes, but I'm not sure in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, currently there is no const folding here anyway :)
ac0b79d
to
d023436
Compare
@bors r+ |
📌 Commit d023436 has been approved by |
fd0538a
to
89b8a97
Compare
I've modified |
@Amanieu I can add the whole path to the test but I don't think it matters. Let me know if you want me to do it regardless. |
@bors r+ |
📌 Commit 89b8a97 has been approved by |
⌛ Testing commit 89b8a97 with merge d10740de8e266e96b0955da0e642a517920e8d20... |
💔 Test failed - checks-actions |
Looks like this is a lot more brittle than I thought... Oh well. Thanks for everyone's time. |
I've given this one some more thought. It's cool to enable an optimization, but since the array size for which the fast path can apply is architecture-dependent, I think I'm going to only enable the test for x86-64. Since the patch shouldn't break anything, I think this is okay. Let me know if this is OK or how I should proceed. |
@bors r+ rollup=never |
📌 Commit de623bf has been approved by |
☀️ Test successful - checks-actions, checks-azure |
Closes #75659
The implementation already uses naive search if the slice if short enough, but the case is complicated enough to not be optimized away. This PR refactors memchr so that it exists early when the slice is short enough.
Codegen-wise, as shown in #75659, memchr was not inlined previously so the only way I could find to test this is to check if there is no memchr call. Let me know if there is a more robust solution here.