-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster IndexOfAny for IgnoreCase Ascii letters #96588
Faster IndexOfAny for IgnoreCase Ascii letters #96588
Conversation
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsThe idea is that if we're searching for both cases of an Ascii letter, we can do it by masking off the bit where they differ and only search for one. E.g. This PR adds a variant of packed public class SearchValuesPackedIgnoreCase
{
private static readonly SearchValues<char> s_asciiTwoLetters = SearchValues.Create("Aa");
private static readonly SearchValues<char> s_asciiFourLetters = SearchValues.Create("AaBb");
private static readonly string s_text = new string('\n', 20_000);
[Benchmark] public int IndexOfOneIgnoreCase() => s_text.AsSpan().IndexOfAny(s_asciiTwoLetters);
[Benchmark] public int IndexOfTwoIgnoreCase() => s_text.AsSpan().IndexOfAny(s_asciiFourLetters);
}
The bigger difference for
|
e67fc77
to
73332dd
Compare
...aries/System.Private.CoreLib/src/System/SearchValues/Any1CharPackedIgnoreCaseSearchValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/SearchValues.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/SearchValues.cs
Show resolved
Hide resolved
...aries/System.Private.CoreLib/src/System/SearchValues/Any1CharPackedIgnoreCaseSearchValues.cs
Show resolved
Hide resolved
Is this pending anything or can it be merged? Saw a couple questions and an ask for a comment to be added from Stephen, but its unclear if you want to handle that in a follow up or not. |
This one is waiting on me, I still have to follow up on Stephen's feedback |
This pull request has been automatically marked |
73332dd
to
5a2568b
Compare
The idea is that if we're searching for both cases of an Ascii letter, we can do it by masking off the bit where they differ and only search for one. E.g.
(input | 0x20) is 'a' or 'b'
instead ofinput is 'a' or 'A' or 'b' or 'B'
.This PR adds a variant of packed
IndexOfAny(char)
andIndexOfAny(char, char)
that does the| 0x20
input transformation and teachesSearchValues
to use it.I've also updated the
RegexCompiler
to useSearchValues
for sets of 4/5 values like the source generator does, so that it can make use of this change.The bigger difference for
[ZzQq]
is switching from a regular 4-valueIndexOfAny
to a 2-value packed one.I tested the same approach for
IndexOfAny(char, char, char)
andIndexOfAnyInRange
, but the throughput there is pretty much on par with or slower thanIndexOfAnyAsciiSearcher
, so I avoided adding more code for it.