Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in IndexOf with InvariantCultureIgnoreCase in .NET 5 and above? #108424

Open
rhalaly opened this issue Oct 1, 2024 · 1 comment · Fixed by #108499
Open

Bug in IndexOf with InvariantCultureIgnoreCase in .NET 5 and above? #108424

rhalaly opened this issue Oct 1, 2024 · 1 comment · Fixed by #108499
Assignees
Labels
area-System.Globalization in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@rhalaly
Copy link

rhalaly commented Oct 1, 2024

Description

We found a weird behavior in the IndexOf method of strings.

The official Unicode specs regarding Special Casing claims that the upper case of the character is ST.

And as expected the following comparison, returns true.

string.Equals("est", "est", System.StringComparison.InvariantCultureIgnoreCase) // returns True

When we go to the IndexOf method, the following code returns 0, which is expected, as we saw that both string are equivalent under invariant culture and ignore case.

"est".IndexOf("est",  System.StringComparison.InvariantCultureIgnoreCase) // returns 0

However, when we use any letters from the English alphabet or spaces at the beginning of the string, we starting to get -1. But if we use some other letters (Cyrillic, Arabic, Hebrew, Latin with umlauts), it gives proper result again. Example in the next section.

Reproduction Steps

"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅

Expected behavior

The expected behavior is that IndexOf will behave in the same way no matter the other characters in the string. So in the examples where we got -1

" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)

We will get 1 and 3, respectively.

Actual behavior

The actual behavior is that IndexOf with InvariantCultureIgnoreCase is not constant and may return wrong output based on surrounding string content.

" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // -1 (Why?)

Regression?

We get the observed results under .NET 5 and above. In .NET Framework 4.7.2 we get the expected results:

"est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 0 ✅
" est".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"".IndexOf("st", System.StringComparison.InvariantCultureIgnoreCase); // 1 ✅
"cccest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"ćććest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"אאאest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅
"фффest".IndexOf("est", System.StringComparison.InvariantCultureIgnoreCase); // 3 ✅

Known Workarounds

Turning of the usage of ICU can solve that issue, but this is unwanted workaround, since other part of the code uses the ICU logic.

Configuration

.NET 5, .NET 6, .NET 7, .NET 8
Windows 11
x64, x86

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 1, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Oct 2, 2024
@tarekgh tarekgh added this to the 9.0.0 milestone Oct 4, 2024
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Oct 4, 2024
@tarekgh tarekgh reopened this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Globalization in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants