Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR 4/5] Rewrite existing extensions using unsafe API #337

Merged
merged 4 commits into from
Aug 23, 2024

Conversation

fzhinkin
Copy link
Collaborator

@fzhinkin fzhinkin commented Jun 7, 2024

This PR reimplements existing extensions to use unsafe API introduced in #334 and #336.

In the next patch, Segment's data field will be encapsulated.

Related to #135

Copy link
Contributor

@shanshin shanshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any performance tests? It seems to me that a slight performance drawdown is inevitable

@@ -22,7 +22,9 @@ public fun Buffer.snapshot(): ByteString {
var curr = head
do {
check(curr != null) { "Current segment is null" }
append(curr.data, curr.pos, curr.limit)
for (idx in 0 until curr.size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not degrading performance?
It seems to me that writing byte-by-byte will be very slow due to the large number of checks and frequent copyInto calls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll reimplement it. We don't track Buffer.snapshot performance, so it didn't ring the bell once I plunked a naive implementation.

@fzhinkin
Copy link
Collaborator Author

fzhinkin commented Jul 1, 2024

Are there any performance tests? It seems to me that a slight performance drawdown is inevitable

Sure, I'll publish results.

@fzhinkin fzhinkin changed the base branch from bulk-api-part-2 to develop July 1, 2024 14:21
@fzhinkin
Copy link
Collaborator Author

fzhinkin commented Jul 3, 2024

@shanshin, so, regarding the performance.

Of course, adding a level of abstraction over a byte array won't make it faster compared to direct access to the byte array. And that's true for older JDK versions. Here, for instance, benchmarking results collected using JDK11:

tl;dr: on average, everything slowed down and UTF8-processing became 15-50% slower:

Summary table for Utf8StringBenchmark
Benchmark name params Baseline avgt std Alternative avgt std Relative difference, %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '20', 'minGap': '128'} 33.744 ns ±0.229 ns 39.719 ns ±0.476 ns -17.7 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '2000', 'minGap': '128'} 1.419 us ±0.006 us 1.719 us ±0.008 us -21.1 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '200000', 'minGap': '128'} 149.209 us ±4.983 us 184.376 us ±0.888 us -23.6 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '20', 'minGap': '128'} 88.202 ns ±1.310 ns 130.807 ns ±1.729 ns -48.3 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '2000', 'minGap': '128'} 8.585 us ±0.060 us 11.791 us ±0.257 us -37.4 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '200000', 'minGap': '128'} 972.234 us ±4.234 us 1360.528 us ±7.276 us -39.9 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '20', 'minGap': '128'} 49.853 ns ±2.116 ns 96.586 ns ±3.500 ns -93.7 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '2000', 'minGap': '128'} 1.809 us ±0.388 us 5.543 us ±0.322 us -206.4 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '200000', 'minGap': '128'} 165.245 us ±0.201 us 592.840 us ±3.039 us -258.8 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '20', 'minGap': '128'} 91.192 ns ±0.948 ns 103.737 ns ±1.490 ns -13.8 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '2000', 'minGap': '128'} 6.330 us ±0.048 us 7.089 us ±0.356 us -12.0 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '200000', 'minGap': '128'} 777.784 us ±3.130 us 806.668 us ±29.196 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '20', 'minGap': '128'} 117.007 ns ±0.481 ns 134.966 ns ±1.396 ns -15.3 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '2000', 'minGap': '128'} 8.304 us ±0.206 us 9.804 us ±0.127 us -18.1 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '200000', 'minGap': '128'} 921.198 us ±1.287 us 1081.292 us ±5.090 us -17.4 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '20', 'minGap': '128'} 85.708 ns ±0.332 ns 111.979 ns ±2.808 ns -30.7 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '2000', 'minGap': '128'} 6.212 us ±0.042 us 7.686 us ±0.104 us -23.7 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '200000', 'minGap': '128'} 693.771 us ±1.806 us 874.371 us ±4.378 us -26.0 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '20', 'minGap': '128'} 75.094 ns ±0.484 ns 100.970 ns ±0.679 ns -34.5 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '2000', 'minGap': '128'} 6.204 us ±0.022 us 8.421 us ±0.521 us -35.7 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '200000', 'minGap': '128'} 661.477 us ±3.348 us 1049.524 us ±6.606 us -58.7 %

However, more up-to-date runtimes are capable of handling this new level of abstraction, resulting in the same or even better performance compared to what could be achieved with the code from the master branch.
Here are benchmarking results collected using JDK17:

tl;dr: a few benchmarks show some slowdown, but in general, performance either improved or remained at the same level.

Summary table for Utf8StringBenchmark
Benchmark name params Baseline avgt std Alternative avgt std Relative difference, %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '20', 'minGap': '128'} 36.538 ns ±0.117 ns 37.500 ns ±1.015 ns N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '2000', 'minGap': '128'} 1.573 us ±0.005 us 1.573 us ±0.007 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'ascii', 'length': '200000', 'minGap': '128'} 185.856 us ±0.527 us 175.881 us ±0.700 us 5.4 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '20', 'minGap': '128'} 87.617 ns ±3.256 ns 85.547 ns ±2.477 ns N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '2000', 'minGap': '128'} 8.519 us ±0.058 us 9.400 us ±0.038 us -10.3 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'utf8', 'length': '200000', 'minGap': '128'} 945.005 us ±15.157 us 1063.708 us ±25.405 us -12.6 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '20', 'minGap': '128'} 50.748 ns ±0.749 ns 51.544 ns ±1.343 ns N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '2000', 'minGap': '128'} 1.867 us ±0.338 us 2.203 us ±0.011 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'sparse', 'length': '200000', 'minGap': '128'} 196.708 us ±4.668 us 244.525 us ±0.999 us -24.3 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '20', 'minGap': '128'} 85.513 ns ±0.233 ns 82.183 ns ±0.153 ns 3.9 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '2000', 'minGap': '128'} 6.691 us ±0.025 us 6.553 us ±0.226 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '2bytes', 'length': '200000', 'minGap': '128'} 822.836 us ±1.886 us 727.114 us ±1.795 us 11.6 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '20', 'minGap': '128'} 111.390 ns ±0.688 ns 111.641 ns ±0.874 ns N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '2000', 'minGap': '128'} 8.594 us ±0.040 us 8.659 us ±0.050 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '3bytes', 'length': '200000', 'minGap': '128'} 950.357 us ±4.860 us 910.884 us ±4.403 us 4.2 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '20', 'minGap': '128'} 76.016 ns ±0.378 ns 78.219 ns ±0.432 ns -2.9 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '2000', 'minGap': '128'} 6.450 us ±0.026 us 6.461 us ±0.041 us N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': '4bytes', 'length': '200000', 'minGap': '128'} 743.317 us ±1.659 us 737.012 us ±2.224 us 0.8 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '20', 'minGap': '128'} 74.214 ns ±5.111 ns 73.452 ns ±0.433 ns N/A
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '2000', 'minGap': '128'} 5.663 us ±0.027 us 5.378 us ±0.010 us 5.0 %
kotlinx.io.benchmarks.Utf8StringBenchmark.benchmark {'encoding': 'bad', 'length': '200000', 'minGap': '128'} 682.448 us ±4.172 us 638.030 us ±1.236 us 6.5 %

On the Android, results are somewhat similar.
Without R8, everything slowdowns (and much more dramatically, compared to JDK):

Summary table for Utf8StringBenchmark
Benchmark name params Baseline avgt std Alternative avgt std Relative difference, %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,ascii'} 707.127 ns ±107.469 ns 881.119 ns ±12.149 ns -24.6 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,utf8'} 1.013 us ±0.026 us 2.122 us ±0.041 us -109.5 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,sparse'} 718.694 ns ±11.497 ns 1159.678 ns ±45.395 ns -61.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,2bytes'} 1.139 us ±0.028 us 3.216 us ±0.063 us -182.3 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,3bytes'} 1.329 us ±0.028 us 3.425 us ±0.055 us -157.7 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,4bytes'} 1.010 us ±0.016 us 2.077 us ±0.032 us -105.7 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,bad'} 1.001 us ±0.078 us 3.113 us ±0.055 us -211.0 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,ascii'} 27.118 us ±0.396 us 44.016 us ±0.638 us -62.3 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,utf8'} 76.915 us ±1.619 us 207.028 us ±3.573 us -169.2 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,sparse'} 25.617 us ±0.361 us 43.542 us ±0.786 us -70.0 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,2bytes'} 71.766 us ±1.371 us 246.694 us ±3.465 us -243.7 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,3bytes'} 86.258 us ±1.583 us 198.968 us ±4.008 us -130.7 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,4bytes'} 60.238 us ±1.065 us 151.510 us ±1.481 us -151.5 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,bad'} 65.308 us ±1.495 us 262.164 us ±4.260 us -301.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,ascii'} 2.722 ms ±0.031 ms 4.283 ms ±0.041 ms -57.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,utf8'} 7.814 ms ±0.119 ms 19.688 ms ±0.182 ms -152.0 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,sparse'} 2.698 ms ±0.015 ms 4.198 ms ±0.050 ms -55.6 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,2bytes'} 7.155 ms ±0.105 ms 24.997 ms ±0.348 ms -249.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,3bytes'} 8.653 ms ±0.128 ms 27.383 ms ±0.427 ms -216.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,4bytes'} 5.959 ms ±0.075 ms 13.978 ms ±0.230 ms -134.6 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,bad'} 6.616 ms ±0.119 ms 25.913 ms ±0.397 ms -291.7 %

However, when R8 kicks in, results start looking similar to what was observed with recent JDK versions:

Summary table for Utf8StringBenchmark
Benchmark name params Baseline avgt std Alternative avgt std Relative difference, %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,ascii'} 672.708 ns ±20.652 ns 693.728 ns ±7.396 ns N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,utf8'} 966.507 ns ±21.952 ns 995.539 ns ±14.733 ns N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,sparse'} 787.546 ns ±14.201 ns 722.858 ns ±71.736 ns N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,2bytes'} 1.179 us ±0.059 us 1.133 us ±0.016 us N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,3bytes'} 1.393 us ±0.015 us 1.356 us ±0.106 us N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,4bytes'} 1.034 us ±0.016 us 1.021 us ±0.022 us N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '20,bad'} 1025.700 ns ±24.620 ns 991.733 ns ±34.321 ns N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,ascii'} 27.105 us ±0.375 us 28.692 us ±0.404 us -5.9 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,utf8'} 83.571 us ±1.455 us 78.218 us ±1.394 us 6.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,sparse'} 26.371 us ±0.446 us 26.997 us ±0.421 us N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,2bytes'} 77.654 us ±1.731 us 70.352 us ±1.775 us 9.4 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,3bytes'} 94.021 us ±2.418 us 86.205 us ±2.089 us 8.3 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,4bytes'} 61.468 us ±1.345 us 57.332 us ±0.427 us 6.7 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '2,000,bad'} 68.067 us ±1.615 us 63.566 us ±1.311 us 6.6 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,ascii'} 2.698 ms ±0.029 ms 2.858 ms ±0.031 ms -5.9 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,utf8'} 7.891 ms ±0.140 ms 7.567 ms ±0.139 ms 4.1 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,sparse'} 2.581 ms ±0.030 ms 2.710 ms ±0.032 ms -5.0 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,2bytes'} 7.479 ms ±0.183 ms 6.870 ms ±0.157 ms 8.1 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,3bytes'} 9.077 ms ±0.185 ms 8.532 ms ±0.177 ms 6.0 %
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,4bytes'} 6.009 ms ±0.105 ms 5.890 ms ±0.100 ms N/A
kotlinx.io.benchmark.android.Utf8Benchmark.readWriteString {'parameters': '200,000,bad'} 7.050 ms ±0.173 ms 6.553 ms ±0.326 ms N/A

To summarize: it's definitely hard to beat a code reading/writing directory to/from a byte array by wrapping these accesses into interface calls. But modern optimizing compiler are good at it and allow preserving the same performance level.
I could back out UTF8-related changes to avoid performance regressions for older runtimes as these changes do not help us solve any issues right now. However, consolidating all access logic inside a segment allows continue experimenting with polymorphic segment types (a.k.a. "let't support ByteBuffer-backed segments").

@lppedd
Copy link
Contributor

lppedd commented Jul 3, 2024

@fzhinkin what about the perf impact for non-JVM platforms, like Native, or they aren't impacted at all?

@fzhinkin
Copy link
Collaborator Author

fzhinkin commented Jul 4, 2024

@lppedd, on Native, marginally, there's a small slowdown when it comes to UTF8 processing, but overall, things look good:

@lppedd
Copy link
Contributor

lppedd commented Jul 4, 2024

@fzhinkin thanks! Do you save previous benchmark results somewhere?
Would be cool to have an history for them to see the trend over time.

@fzhinkin
Copy link
Collaborator Author

fzhinkin commented Jul 4, 2024

@lppedd, no, I don't, but it's worth saving. Thanks for the suggestion!
Historical data does not work well when it comes to performance comparison, and it's always better to rerun benchmarks for compared versions back to back. However, there are absolutely no reasons to abstain from keeping the results.

@fzhinkin fzhinkin requested a review from shanshin August 12, 2024 18:55
@fzhinkin fzhinkin merged commit 821b1bc into develop Aug 23, 2024
1 check passed
@fzhinkin fzhinkin deleted the bulk-api-part-3 branch August 23, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants