Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CFStringGetCharacters instead of calling CFStringGetBytes twice #76271

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AreaZR
Copy link
Contributor

@AreaZR AreaZR commented Sep 5, 2024

This can get the character information much faster.

Copy link
Contributor

@al45tair al45tair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can get the character information much faster.

Do you have measurements to prove that? The issue here is that there's a good chance the CFStrings we're using here are ASCII, in which case doing CFStringGetCharacters() is going to convert to UTF-16 and then String() is going to convert that back to UTF-8, plus the buffer you've allocated in the meantime is twice as large as needed.

@AreaZR
Copy link
Contributor Author

AreaZR commented Sep 5, 2024

This can get the character information much faster.

Do you have measurements to prove that? The issue here is that there's a good chance the CFStrings we're using here are ASCII, in which case doing CFStringGetCharacters() is going to convert to UTF-16 and then String() is going to convert that back to UTF-8, plus the buffer you've allocated in the meantime is twice as large as needed.

if let ptr = CFStringGetCStringPtr is the fast path. That does UTF-8. Also UTF-16 is how they are natively stored.

This can get the character information much faster.
@al45tair
Copy link
Contributor

al45tair commented Sep 6, 2024

if let ptr = CFStringGetCStringPtr is the fast path. That does UTF-8. Also UTF-16 is how they are natively stored.

Well, CFStringGetCStringPtr() is the fast path, since CFString stores ASCII bytes in many cases. I like the CFStringGetCharactersPtr() call, because in the case where the CFString is UTF-16, that might indeed be faster, but using CFStringGetCharacters() rather than CFStringGetBytes() is unlikely to be any faster and might actually be slower. Why? Because if CFStringGetCharactersPtr() fails, it means the string storage isn't contiguous UTF-16. In the case, there's some kind of conversion going to happen, and if you use CFStringGetCharacters() then your buffer is twice the size and you've got to traverse it at least twice (once to fill it, maybe converting to UTF-16 in the process, and once in String to convert it to UTF-8).

In addition, you said

This can get the character information much faster.

If that's true, you should have numbers to back it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants