-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Start on better Parquet delta decoding #18049
Conversation
8e6325f
to
221fb44
Compare
I got quite far on this today. And although it does not show any failures in the CI anymore, doing some fuzzing still breaks it. I am hoping to get this fixed tomorrow and get it ready for merging. |
CodSpeed Performance ReportMerging #18049 will not alter performanceComparing Summary
|
I did quite extensive testing for this PR with many fuzz tests and integration tests against pyarrow's parquet writer. While I cannot guarantee that it is bug free, I am pretty confident it is close to bug free. |
Alright! |
Since Delta Lengths Byte Array will be the default encoding for strings, we want to speed up decoding this and get it up to power with other encodings. This starts the work of implementing
collect
,sum
andskip
methods and removing the allocations that it currently does.