-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement collation native functions functions #86895
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization Issue Detailsnull
|
Tagging subscribers to 'os-ios': @steveisok, @akoeplinger Issue Detailsnull
|
@@ -12,5 +12,18 @@ internal static partial class Globalization | |||
{ | |||
[LibraryImport(Libraries.GlobalizationNative, EntryPoint = "GlobalizationNative_CompareStringNative", StringMarshalling = StringMarshalling.Utf16)] | |||
internal static unsafe partial int CompareStringNative(string localeName, int lNameLen, char* lpStr1, int cwStr1Len, char* lpStr2, int cwStr2Len, CompareOptions options); | |||
|
|||
[LibraryImport(Libraries.GlobalizationNative, EntryPoint = "GlobalizationNative_EndsWithNative", StringMarshalling = StringMarshalling.Utf16)] | |||
[MethodImpl(MethodImplOptions.NoInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why are we adding NoInlining
here and on StartsWithNative
but not on the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did like is done for ICU
https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Collation.cs#L34
But not sure why only these 2 functions have MethodImplOptions.NoInlining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that it is to avoid regressing the hot path of the method by inlining the PInvoke.
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Show resolved
Hide resolved
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
Mapped to Apple Native API `rangeOfString:options:range:locale:`(https://developer.apple.com/documentation/foundation/nsstring/1417348-rangeofstring?language=objc) | ||
|
||
In `rangeOfString:options:range:locale:` objects are compared by checking the Unicode canonical equivalence of their code point sequences. | ||
In cases where search string contains diaeresis and has different normalization form than in source string result can be incorrect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't limited to just the diaeresis diacritic mark is it? If it isn't limited to just diaeresis can we change it to diacritics.
Search string contains diaeresis and with source string they have same letters with different char lengths but substring is not | ||
normalized in source. example: search string: `U\u0308 and \u00FC` source string: `Source is a\u0308\u0308a and \u0075\u0308` | ||
as it is visible from example normalizaing search string to form C or D will not help to find substring in source string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I'm understanding this.
searchString U\u0308 and \u00FC
(Ü and ü
)
source string a\u0308\u0308a and \u0075\u0308
(ä̈a and ü
)
Is the search string supposed to be a substring in the source string? Was it supposed to be U\u0308
and \u00FC
should both be interpreted as substrings of a\u0308\u0308a and (\u0075\u0308)
in that it matches the portion in the parenthesis?
I was actually expecting the not covered cases to be something like, source string is a sequence of unicode code points that represent a mixture of precomposed characters and decomposed characters.
i.e. Source string H\u00EBll\u006F\u0308 World
(Hëllö World
)
Search string \u0065\u0308ll\u00F6
(ëllö
)
These would bypass the precomposed vs precomposed comparison because it would be source
string H\u00EBll\u00F6 World
and search string \u00EBll\u00F6
However, neither the fully precomposed form of the search string \u00EBll\u00FB
nor the fully decomposed form of the search string \u0065\u0308ll\u006F\u0308
would match the unicode code point sequence in the source string \u00EBll\u006F\u0308
so it would fall through the Form C and Form D fallbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Source should be "Source is \u00DC and \u0075\u0308"
.
search string: U\u0308 and \u00FC
(Ü and ü) source string: Source is \u00DC and \u0075\u0308
(Source is Ü and ü). Will update it.
// localizedStandardRangeOfString is performing a case and diacritic insensitive, locale-aware search and finding first occurance. | ||
if ((comparisonOptions & IgnoreCase) && lNameLength == 0 && fromBeginning) | ||
{ | ||
NSRange localizedStandardRange = [sourceStrCleaned localizedStandardRangeOfString:searchStrCleaned]; | ||
if (localizedStandardRange.location != NSNotFound) | ||
{ | ||
result.location = localizedStandardRange.location; | ||
result.length = localizedStandardRange.length; | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comment
localizedStandardRangeOfString
is performing case and diacritic insensitive, locale-aware search by implementation, that is why I am checking for IgnoreCase
and if locale is not passed .(see https://developer.apple.com/documentation/foundation/nsstring/1413574-localizedstandardrangeofstring)
Right, I see that it finds the first match with a case insensitive and diacritic insensitive search, but are the other 7 permutations of these options (!(compareOptions & IgnoreCase) || lNameLength != 0 || !fromBeginning
) covered by the following logic?
I'm wondering if this block is truly necessary. If ((comparisonOptions & IgnoreCase) && lNameLength == 0 && !fromBeginning)
is covered by the following logic, I would imagine that this case is covered as well without this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does lNameLength == 0
mean locale-aware? If its supposed to be the system locale [NSLocale systemLocale]
, then shouldn't this check for localeName == NULL || lNameLength == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the IgnoreNonSpace
be the diacritic insensitive condition as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localizedStandardRangeOfString
is not necessary , if it makes the function more complicated I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the block suggested to me that localizedStandardRangeOfString
could cover one specific case that the rest of the following logic couldn't, which felt odd specifically because that this wouldn't be hit if fromBeginning
was false. Another consideration is that it seems to ignore the rest of the flags of options
besides case insensitive and diacritic insensitive, so I wonder if this block would have been correct in those scenarios (if we had the test cases for them). I think it could make sense to keep it if it was much faster than rangeOfString:options:range:locale:
, but it seems cleaner to rely on the following logic if it can be handled correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it handles correctly all cases without localizedStandardRangeOfString
.
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for your patience!
1. Search string contains diaeresis and has same normalization form as in source string. | ||
2. Search string contains diaeresis but with source string they have same letters with different char lengths but substring is normalized in source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: diacritic
Search string contains diacritics and with source string they have same letters with different char lengths but substring is not | ||
normalized in source. example: search string: `U\u0308 and \u00FC` (Ü and ü) source string: `Source is \u00DC and \u0075\u0308` (Source is Ü and ü) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Source string's intended substring match containing characters of mixed composition forms cannot be matched by 2.
because partial precomposition/decomposition is not performed.
@@ -107,13 +107,16 @@ public static IEnumerable<object[]> IsSuffix_TestData() | |||
{ | |||
yield return new object[] { s_hungarianCompare, "foobardzsdzs", "rddzs", CompareOptions.None, false, 0 }; | |||
yield return new object[] { s_frenchCompare, "\u0153", "oe", CompareOptions.None, false, 0 }; | |||
yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uDC00", CompareOptions.None, false, 0 }; | |||
yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uDC00", CompareOptions.IgnoreCase, false, 0 }; | |||
if (!PlatformDetection.IsHybridGlobalizationOnOSX) // TODO: check this for OSX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a todo?
return result; | ||
|
||
// in case search string is inside source string but we can't find the index return -2 | ||
result.location = -2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may conflict with the Debug.Assert(result.Location != -2);
in CompareInfo.OSX.cs
because we might actually hit -2
if we hit the unsupported mixed composition source string case. Perhaps we can use something different to indicate that this is a case that is currently unsupported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case changed return value to -3 and added exception.
@@ -0,0 +1,11 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
using System.Runtime.InteropServices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using System.Runtime.InteropServices; |
Unnecessary
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
using System.Runtime.InteropServices; | ||
namespace System.Globalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace System.Globalization | |
internal static partial class Interop |
This type is not specific to globalization. It should have neutral namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the unmanaged side is not returning the NSRange type in the latest iteration. It returns its own Range type. In that case, it should not be called NSRange. It should be called Range to match the unmanaged side and it should be in System.Globalization.iOS.cs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using NSRange
https://developer.apple.com/documentation/foundation/nsrange?language=objc on pal_collation.m,
but to make it compiled on https://github.com/dotnet/runtime/blob/main/src/native/libs/System.Globalization.Native/pal_collation.h I needed to add Range type.
Do you mean Range
struct should be added in System.Globalization.iOS.cs ? I couldn't find that file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant src/libraries/Common/src/Interop/Interop.Collation.OSX.cs
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
Failures are not related. |
Implements a chunk of apple native-api based globalization.
Old, icu-based private API:
New, non-icu private API:
Affected public API :
APIs that will not be supported
Fixes #86699
Contributes to #80689
cc @SamMonoRT