-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize String#to_utf16
#14671
Optimize String#to_utf16
#14671
Conversation
`each_char` already handles replacement chars
The performance benefit is questionable because `ascii_only?` iterates the string bytes as well, leading to three consecutive iterations in the worst case.
end | ||
|
||
# Append null byte | ||
slice[i] = 0_u16 | ||
appender << 0_u16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice#new already guarantees that the returned memory is zeroed out, so this could be removed
..or GC.malloc_atomic
could be used to improve performance (+12% for me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..or GC.malloc_atomic could be used to improve performance (+12% for me).
That's unexpected. Slice(UInt16).new
should use malloc_atomic
implicitly.
This is not directly obvious, because there's some indirection through the compiler. Pointer.malloc
is a primitive and the compiler decides whether that calls GC.malloc
or GC.malloc_atomic
depending on the type (whether it could contain internal pointers or not).
Can you share the patch for +12% performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice(UInt16).new
should usemalloc_atomic
implicitly.
Yes, but crystal always zeroes out the memory, even if the GC already guarantees that the memory was zeroed out:
crystal/src/compiler/crystal/codegen/codegen.cr
Line 2174 in 38be359
memset pointer, int8(0), size_t(size) |
So atomically allocated memory is always cleared once, non-atomic memory is always cleared twice.
In this case, the memory does not need clearing at all.
Can you share the patch for +12% performance?
- slice = Slice(UInt16).new(u16_size + 1)
+ slice = Slice(UInt16).new(GC.malloc_atomic(sizeof(UInt16).to_u64! &* (u16_size + 1)).as(UInt16*), u16_size + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. It's not atomic alloc but the clearing which makes a difference and we can prevent that with GC.malloc_atomic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave this as is.
Explicitly adding a trailing null shouldn't have any significant impact. So it doesn't matter to do it despite the memory being zeroed out. However, it makes sure the algorithm works correctly with non-zeroed memory. We should discuss this in a separate issue (#14687).
This patch contains a series of optimizations to
String#to_utf16
to improve performance:each_char
already excludes invalid codepoints, so we only have to handle the encoding as one or two UInt16.ascii_only?
branch. The performance benefit is questionable becauseascii_only?
iterates the string. With optimizations to the regular loop, this special case doesn't provide much extra performance, so it's expendable.