Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternative Sink.writeString implementations on JVM #316

Open
fzhinkin opened this issue May 7, 2024 · 3 comments
Open

Consider alternative Sink.writeString implementations on JVM #316

fzhinkin opened this issue May 7, 2024 · 3 comments

Comments

@fzhinkin
Copy link
Collaborator

fzhinkin commented May 7, 2024

On JVM, instead of reading each character separately and then encoding it to UTF-8 and writing to a buffer, it might be faster to:

  • extract chars to a CharArray and then iterate over it;
  • simply use toByteArray.

For other libraries, namely kotlinx.serialization, some of these approaches performed better. While quick ad-hoc experiments didn't show any pros for kotlinx-io, it does make sense to investigate it thoroughly.

@fzhinkin
Copy link
Collaborator Author

Combination of String::toByteArray and UnsafeBufferOperations::moveToTail show better performance when it comes to strings whose chars could be encoded using same-length byte sequences. However, the current implementation significantly outperforms String::toByteArray-based approach on strings where characters require byte sequences of variadic lengths.
And, of course, String::toByteArray result in higher allocation rate.

@qwwdfsad
Copy link
Collaborator

In serialization, we leverage intrinsified String::getChars (pros: vectorized, much faster compact strings unpacking, no rangechecks) and also rely on the fact that our CharArrays are pooled, leading to no allocations.

@fzhinkin
Copy link
Collaborator Author

For kotlinx-io, it seems like such an approach does not provide any significant performance improvements on average:

public fun Sink.writeStringJvm2(string: String, startIndex: Int = 0, endIndex: Int = string.length) {

https://jmh.morethan.io/?source=https://gist.githubusercontent.com/fzhinkin/a11a2ce595cadb8fba700cdbe18a6f4f/raw/fbb87909636731439aac80948fa023bcc10d4269/toCharArray-based-writeString.json

In some scenarios, performance is better, in others it's worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants