Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup some string operating functions #96099

Merged
merged 16 commits into from
Jan 4, 2024

Conversation

huoyaoyuan
Copy link
Member

Replace with standard or shared ones.

Please confirm their behavior and performance characteristics are desired.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 16, 2023
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved

HRESULT hr = FString::Utf8_Unicode_Length(utf8str, & allAscii, & length);
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be LPSTR (UTF8).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unchanged (with whitespace off).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length -> destLen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR));
CHAR16_T* buffer = (CHAR16_T*) AllocThrows((length + 1) * sizeof(CHAR16_T));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dn-u16.h uses WCHAR while minipal/utf-8 uses CHAR16_T. They are not treated as compatible although both typedef-d from wchar_t. Is any reason that compiler supported char16_t can't be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some platforms do not support char16_t, which is why we went with the least supported type to handle coreclr and mono simultaneously.

#ifdef TARGET_WINDOWS
typedef wchar_t CHAR16_T;
#else
typedef unsigned short CHAR16_T;
#endif

During this refactoring, it's better to directly allocate the target type; CHAR16_T for UTF16 or char for UTF8 to avoid further confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some platforms do not support char16_t,

I was asking for the exact platforms and GCC/Clang versions we need to target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS doesn't support char16_t so you can cross the whole IsApplePlatform series off the list. Our least supported versions support C11 and hence char16_t on Linux, so only OS headers are relevant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS doesn't support char16_t

Asked a friend to confirm that char16_t is supported with latest build tool on macOS. The mentions of macOS doesn't support char16_t are quite old. On cppreference, char16_t is marked as supported by Apple clang. There's also documentation about CChar16 in Swift doc.

Can you confirm again about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in C mode, we had some problems. I don't remember the details offhand. This is the quick way to find out what doesn't work in the CI:

 #ifdef TARGET_WINDOWS
 typedef wchar_t CHAR16_T;
 #else
- typedef unsigned short CHAR16_T;
+ typedef char16_t CHAR16_T; 
 #endif 

(note that we'd still need CHAR16_T for Windows, so perhaps it's better to keep this investigation separate from this PR)

@am11
Copy link
Member

am11 commented Dec 16, 2023

Thanks for starting this work. Ideally we should remove the intermediate wrappers for string conversion and directly use minipal/utf8.h in coreclr/mono/corehost, including windows targets; but it will take a while to get there. (I started the grand change in a branch, but realized that it's best to do in batches to ease up the review like you are doing here)

Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
@huoyaoyuan
Copy link
Member Author

Now tests are passing. I need some assist about link failure on macos.

@am11
Copy link
Member

am11 commented Dec 26, 2023

I need some assist about link failure on macos.

Try main...am11:runtime:patch_gh-96099.

@huoyaoyuan
Copy link
Member Author

Hmm it is linked in Unix pal, but not minipal. Will it cause duplicated symbols on Linux?

src/coreclr/debug/daccess/task.cpp Outdated Show resolved Hide resolved
src/coreclr/debug/daccess/task.cpp Outdated Show resolved Hide resolved
src/coreclr/debug/daccess/inspect.cpp Outdated Show resolved Hide resolved
src/coreclr/utilcode/sstring.cpp Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/minipal/Unix/CMakeLists.txt Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
src/coreclr/inc/corhlprpriv.h Outdated Show resolved Hide resolved
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
@huoyaoyuan
Copy link
Member Author

Since the buffer will always be sufficient, the resultant error code could only be NO_UNICODE_TRANSLATION.
I assume using errno is desired since it's in latest minipal implementation.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@am11 anything else?

Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/coreclr/utilcode/sstring.cpp Outdated Show resolved Hide resolved
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
@AaronRobinsonMSFT AaronRobinsonMSFT merged commit bc81c55 into dotnet:main Jan 4, 2024
111 checks passed
@huoyaoyuan huoyaoyuan deleted the str-op branch January 5, 2024 02:12
@janvorli janvorli mentioned this pull request Jan 19, 2024
tommcdon added a commit to tommcdon/runtime that referenced this pull request Jan 21, 2024
@tommcdon tommcdon mentioned this pull request Jan 21, 2024
tommcdon added a commit that referenced this pull request Jan 21, 2024
* Revert "Forward minipal_get_length_utf16_to_utf8 and minipal_convert_utf16_to_utf8 from DAC (#97055)"

This reverts commit 1263107.

* Revert "Cleanup some string operating functions (#96099)"

This reverts commit bc81c55.
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
* Revert "Forward minipal_get_length_utf16_to_utf8 and minipal_convert_utf16_to_utf8 from DAC (dotnet#97055)"

This reverts commit 1263107.

* Revert "Cleanup some string operating functions (dotnet#96099)"

This reverts commit bc81c55.
@github-actions github-actions bot locked and limited conversation to collaborators Feb 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants