-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DELTA_BYTE_ARRAY decoder in parquet reader #15923
Merged
raunaqmorarka
merged 3 commits into
trinodb:master
from
raunaqmorarka:pqr-delta-byte-array
Feb 2, 2023
Merged
Optimize DELTA_BYTE_ARRAY decoder in parquet reader #15923
raunaqmorarka
merged 3 commits into
trinodb:master
from
raunaqmorarka:pqr-delta-byte-array
Feb 2, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sopel39
approved these changes
Feb 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also wait for @skrzypo987 approval
skrzypo987
approved these changes
Feb 1, 2023
raunaqmorarka
force-pushed
the
pqr-delta-byte-array
branch
from
February 2, 2023 06:17
98bcf63
to
95e0b83
Compare
Benchmark (positionLength) (type) Mode Cnt Before After Units BenchmarkBinaryColumnReader.read VARIABLE_0_100 UNBOUNDED thrpt 10 6.772 ± 0.282 63.404 ± 1.339 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_100 VARCHAR_ASCII_BOUND_EXACT thrpt 10 6.197 ± 1.045 61.093 ± 0.756 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_100 CHAR_ASCII_BOUND_HALF thrpt 10 6.581 ± 0.559 20.286 ± 3.499 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_100 CHAR_BOUND_HALF_PADDING_SOMETIMES thrpt 10 7.107 ± 0.129 20.926 ± 1.483 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_1000 UNBOUNDED thrpt 10 1764.530 ± 65.214 8311.994 ± 417.798 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_1000 VARCHAR_ASCII_BOUND_EXACT thrpt 10 1615.245 ± 70.177 7364.618 ± 220.423 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_1000 CHAR_ASCII_BOUND_HALF thrpt 10 1601.118 ± 46.370 3460.392 ± 114.178 ops/s BenchmarkBinaryColumnReader.read VARIABLE_0_1000 CHAR_BOUND_HALF_PADDING_SOMETIMES thrpt 10 1315.271 ± 74.272 3522.025 ± 137.662 ops/s BenchmarkBinaryColumnReader.read FIXED_10 UNBOUNDED thrpt 10 12.629 ± 0.554 138.457 ± 4.436 ops/s BenchmarkBinaryColumnReader.read FIXED_10 VARCHAR_ASCII_BOUND_EXACT thrpt 10 9.612 ± 3.404 125.494 ± 3.186 ops/s BenchmarkBinaryColumnReader.read FIXED_10 CHAR_ASCII_BOUND_HALF thrpt 10 10.112 ± 0.340 39.157 ± 5.328 ops/s BenchmarkBinaryColumnReader.read FIXED_10 CHAR_BOUND_HALF_PADDING_SOMETIMES thrpt 10 10.293 ± 0.774 37.144 ± 6.790 ops/s BenchmarkBinaryColumnReader.read FIXED_100 UNBOUNDED thrpt 10 806.429 ± 12.979 5250.974 ± 96.183 ops/s BenchmarkBinaryColumnReader.read FIXED_100 VARCHAR_ASCII_BOUND_EXACT thrpt 10 798.022 ± 12.012 5383.613 ± 251.305 ops/s BenchmarkBinaryColumnReader.read FIXED_100 CHAR_ASCII_BOUND_HALF thrpt 10 644.532 ± 28.102 2031.493 ± 106.947 ops/s BenchmarkBinaryColumnReader.read FIXED_100 CHAR_BOUND_HALF_PADDING_SOMETIMES thrpt 10 676.331 ± 39.974 2014.215 ± 134.481 ops/s Co-authored-by: Krzysztof Skrzypczynski <krzysztof.skrzypczynski@starburstdata.com>
Benchmark (byteArrayLength) Mode Cnt Before After Units BenchmarkShortDecimalColumnReader.read 1 thrpt 10 19.552 ± 2.7966 109.689 ± 3.551 ops/s BenchmarkShortDecimalColumnReader.read 2 thrpt 10 19.388 ± 0.7022 77.033 ± 4.054 ops/s BenchmarkShortDecimalColumnReader.read 3 thrpt 10 16.082 ± 1.4900 59.217 ± 3.118 ops/s BenchmarkShortDecimalColumnReader.read 4 thrpt 10 20.012 ± 1.3366 73.665 ± 3.047 ops/s BenchmarkShortDecimalColumnReader.read 5 thrpt 10 16.827 ± 2.2422 55.817 ± 5.022 ops/s BenchmarkShortDecimalColumnReader.read 6 thrpt 10 20.799 ± 0.1855 66.127 ± 1.533 ops/s BenchmarkShortDecimalColumnReader.read 7 thrpt 10 16.956 ± 1.0444 54.469 ± 3.195 ops/s BenchmarkShortDecimalColumnReader.read 8 thrpt 10 14.576 ± 2.4777 48.632 ± 1.844 ops/s Benchmark Mode Cnt Before After Units BenchmarkLongDecimalColumnReader.read thrpt 20 19.669 ± 1.816 37.417 ± 0.829 ops/s
Benchmark Mode Cnt Before After Units BenchmarkUuidColumnReader.read thrpt 20 22.372 ± 1.186 79.362 ± 5.013 ops/s
raunaqmorarka
force-pushed
the
pqr-delta-byte-array
branch
from
February 2, 2023 06:20
95e0b83
to
6ddeb52
Compare
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Optimize DELTA_BYTE_ARRAY decoder in parquet reader for BINARY and FIXED_LEN_BYTE_ARRAY parquet types
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: