Optimize `String#to_utf16` #14671

straight-shoota · 2024-06-06T13:34:10Z

This patch contains a series of optimizations to String#to_utf16 to improve performance:

Remove branches for impossible codepoints. each_char already excludes invalid codepoints, so we only have to handle the encoding as one or two UInt16.
Drop ascii_only? branch. The performance benefit is questionable because ascii_only? iterates the string. With optimizations to the regular loop, this special case doesn't provide much extra performance, so it's expendable.
Use pointer appender to avoid bounds checks on every slice assignment. It also improves convenience.
Use wrapping math operators. These operations cannot overflow, so we can use the unsafe variants for better performance.

`each_char` already handles replacement chars

The performance benefit is questionable because `ascii_only?` iterates the string bytes as well, leading to three consecutive iterations in the worst case.

BlobCodes · 2024-06-06T15:43:51Z

src/string/utf16.cr

    end

    # Append null byte
-    slice[i] = 0_u16
+    appender << 0_u16


Slice#new already guarantees that the returned memory is zeroed out, so this could be removed

..or GC.malloc_atomic could be used to improve performance (+12% for me).

..or GC.malloc_atomic could be used to improve performance (+12% for me).

That's unexpected. Slice(UInt16).new should use malloc_atomic implicitly.
This is not directly obvious, because there's some indirection through the compiler. Pointer.malloc is a primitive and the compiler decides whether that calls GC.malloc or GC.malloc_atomic depending on the type (whether it could contain internal pointers or not).

Can you share the patch for +12% performance?

Slice(UInt16).new should use malloc_atomic implicitly.

Yes, but crystal always zeroes out the memory, even if the GC already guarantees that the memory was zeroed out:

crystal/src/compiler/crystal/codegen/codegen.cr

Line 2174 in 38be359

memset pointer, int8(0), size_t(size)

So atomically allocated memory is always cleared once, non-atomic memory is always cleared twice.
In this case, the memory does not need clearing at all.

Can you share the patch for +12% performance?

- slice = Slice(UInt16).new(u16_size + 1) + slice = Slice(UInt16).new(GC.malloc_atomic(sizeof(UInt16).to_u64! &* (u16_size + 1)).as(UInt16*), u16_size + 1)

Gotcha. It's not atomic alloc but the clearing which makes a difference and we can prevent that with GC.malloc_atomic.

I think it's fine to leave this as is.
Explicitly adding a trailing null shouldn't have any significant impact. So it doesn't matter to do it despite the memory being zeroed out. However, it makes sure the algorithm works correctly with non-zeroed memory. We should discuss this in a separate issue (#14687).

straight-shoota added 4 commits June 6, 2024 15:26

Remove branches for impossible codepoints

281c227

`each_char` already handles replacement chars

Drop ascii_only? branch

61615cb

The performance benefit is questionable because `ascii_only?` iterates the string bytes as well, leading to three consecutive iterations in the worst case.

Use appender to avoid bounds checks

6aa81f9

Use wrapping math operators

b58d679

straight-shoota added performance kind:refactor topic:stdlib:text labels Jun 6, 2024

straight-shoota self-assigned this Jun 6, 2024

BlobCodes mentioned this pull request Jun 6, 2024

UTF-16 string literals #14670

Closed

ysbaddaden approved these changes Jun 7, 2024

View reviewed changes

straight-shoota added this to the 1.13.0 milestone Jun 7, 2024

BlobCodes reviewed Jun 7, 2024

View reviewed changes

straight-shoota mentioned this pull request Jun 10, 2024

Allocation of non-zeroed memory for performance boost #14687

Open

straight-shoota merged commit ef04b2e into crystal-lang:master Jun 12, 2024
61 checks passed

straight-shoota deleted the refactor/string-to_utf16 branch June 12, 2024 13:14

BrewTestBot mentioned this pull request Jul 10, 2024

crystal 1.13.0 Homebrew/homebrew-core#176873

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `String#to_utf16` #14671

Optimize `String#to_utf16` #14671

straight-shoota commented Jun 6, 2024

BlobCodes Jun 6, 2024

straight-shoota Jun 8, 2024 •

edited

Loading

BlobCodes Jun 8, 2024

straight-shoota Jun 8, 2024

straight-shoota Jun 10, 2024 •

edited

Loading

Optimize String#to_utf16 #14671

Optimize String#to_utf16 #14671

Conversation

straight-shoota commented Jun 6, 2024

BlobCodes Jun 6, 2024

Choose a reason for hiding this comment

straight-shoota Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

BlobCodes Jun 8, 2024

Choose a reason for hiding this comment

straight-shoota Jun 8, 2024

Choose a reason for hiding this comment

straight-shoota Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

Optimize `String#to_utf16` #14671

Optimize `String#to_utf16` #14671

straight-shoota Jun 8, 2024 •

edited

Loading

straight-shoota Jun 10, 2024 •

edited

Loading