Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Regex to use SearchValues<string> in compiled / source generator for IgnoreCase multi-strings #98791

Merged
merged 5 commits into from
Feb 23, 2024

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 22, 2024

The analyzer determines a set of prefixes that can start any match, and then uses SearchValues<string> with IndexOfAny to find the next one from that set. It's currently only enabled for case-insensitive; we need to do some more perf validation before enabling for case-sensitive.

Method Toolchain Pattern Options Mean Ratio
Count \main\corerun.exe (?i)Sher[a-z]+|Hol[a-z]+ Compiled 691.66 us 1.00
Count \pr\corerun.exe (?i)Sher[a-z]+|Hol[a-z]+ Compiled 130.90 us 0.19
Count \main\corerun.exe (?i)Sherlock|Holmes|Watson Compiled 873.37 us 1.00
Count \pr\corerun.exe (?i)Sherlock|Holmes|Watson Compiled 132.12 us 0.15

Contributes to #85693

…or TryFindNextStartingPosition

The analyzer determines a set of prefixes that can start any match, and then uses SearchValues with IndexOfAny to find the next one from that set. It's currently only enabled for case-insensitive; we need to do some more perf validation before enabling for case-sensitive.
Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines +51 to +52
// Arbitrary string length limit (with some wiggle room) to avoid creating strings that are longer than is useful and consuming too much memory.
const int MaxPrefixLength = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

longer than is useful and consuming too much memory

This is mainly about not spending too many resources on the analysis part, not about the cost of the potential SearchValues itself, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "the longer than is useful" part was about SearchValues itself. Is that not the case?

@danmoseley
Copy link
Member

A lot of work to get to this point! I guess I should remeasure the rust benchmarks with all the alternations.

@ghost
Copy link

ghost commented Feb 22, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

The analyzer determines a set of prefixes that can start any match, and then uses SearchValues<string> with IndexOfAny to find the next one from that set. It's currently only enabled for case-insensitive; we need to do some more perf validation before enabling for case-sensitive.

Method Toolchain Pattern Options Mean Ratio
Count \main\corerun.exe (?i)Sher[a-z]+|Hol[a-z]+ Compiled 691.66 us 1.00
Count \pr\corerun.exe (?i)Sher[a-z]+|Hol[a-z]+ Compiled 130.90 us 0.19
Count \main\corerun.exe (?i)Sherlock|Holmes|Watson Compiled 873.37 us 1.00
Count \pr\corerun.exe (?i)Sherlock|Holmes|Watson Compiled 132.12 us 0.15

Contributes to #85693

Author: stephentoub
Assignees: stephentoub
Labels:

area-System.Text.RegularExpressions

Milestone: -

@stephentoub stephentoub merged commit 99b7601 into dotnet:main Feb 23, 2024
109 of 111 checks passed
@stephentoub stephentoub deleted the usesearchvaluesstringinregex branch February 23, 2024 11:28
@DrewScoggins
Copy link
Member

DrewScoggins commented Feb 27, 2024

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants