Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IndexOf Optimization Code #108499

Merged
merged 4 commits into from
Oct 4, 2024
Merged

Conversation

tarekgh
Copy link
Member

@tarekgh tarekgh commented Oct 2, 2024

Fixes #108424

We have optimization code that attempts to perform the IndexOf operation using ASCII characters until the operation completes or encounters a character that requires linguistic processing, at which point it falls back to the slower ICU path. However, there was an issue where, if the source was longer than the value, the optimization would reach the boundary in the source and incorrectly conclude that no match was found. The code wasn't checking the remaining characters in the source to determine if it needed to fall back to the ICU path, which caused IndexOf to return incorrect results.

Example " est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 returns wrong result -1 while "est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 returns correct result 0.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 2, 2024

@jkotas as you are familiar with the changed code, could you please help review this fix. Note, I am planning to port this fix to the 9.0 release and possibly to 8.0 too as a partner team reported it.

CC @ericstj @rhalaly

@jkotas
Copy link
Member

jkotas commented Oct 3, 2024

Can the same problem happen in IndexOfOrdinalHelper (it is a method with very similar structure that immediately follows).

@@ -192,6 +195,12 @@ private unsafe int IndexOfOrdinalIgnoreCaseHelper(ReadOnlySpan<char> source, Rea
Next: ;
}

// Before we return -1, check if the remaining source contains any special or non-Ascii characters.
if (remainingSource.ContainsAnyExcept(s_nonSpecialAsciiChars))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to go through the whole remainingSource again? It looks expensive.

Copy link
Member Author

@tarekgh tarekgh Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingSource is the remaining part of the source which has not been processed before. So, we are not going through it again. Also, we do that only when there is no match found in the previous part of the source. I expect in common cases remainingSource will be very short as it will have length less than the target length.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 3, 2024

Can the same problem happen in IndexOfOrdinalHelper (it is a method with very similar structure that immediately follows).

I have updated IndexOfOrdinalHelper too. In reality, it will be hard to reproduce the problem with IndexOfOrdinalHelper because it is called when doing case sensitive only and hard to find a non-ascii character that can equal to Ascii character/string without passing more comparison options. Passing more comparison options will causes IndexOfOrdinalHelper to not get called.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 3, 2024

@jkotas please let me know if you have more comments or if we are good to go. Thanks!

@tarekgh
Copy link
Member Author

tarekgh commented Oct 4, 2024

@stephentoub @ericstj is possible to help code reviewing this fix? @jkotas is off and I want to get this soon to 9.0 release and possibly to 8.0 too.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks correct to me

@tarekgh tarekgh merged commit f1de7b0 into dotnet:main Oct 4, 2024
146 of 148 checks passed
@tarekgh
Copy link
Member Author

tarekgh commented Oct 4, 2024

/backport to release/9.0

Copy link
Contributor

github-actions bot commented Oct 4, 2024

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/11188199484

@tarekgh
Copy link
Member Author

tarekgh commented Oct 4, 2024

/backport to release/8.0-staging

Copy link
Contributor

github-actions bot commented Oct 4, 2024

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/11188206621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in IndexOf with InvariantCultureIgnoreCase in .NET 5 and above?
3 participants