-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use aircompressor codecs #3575
Use aircompressor codecs #3575
Conversation
Fixes deephaven#3470 Verified deephaven#2495 and deephaven#2569 still work, as well as LZO compressed parquet files
Nightlies all pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from here - please verify on arm64.
@abaranec do you have time to take a look at this too?
I'll get these all run on M1 before merge. |
I've verified on M1 that the new tests pass, and have also verified reading large parquet files in GZIP, LZO, LZ4, SNAPPY, and ZSTD. |
Updates to use the aircompressor ZSTD codec, which does not suffer from the same off-heap leak that our own implementation does. Fixes #3470
Additionally, this PR more explicitly configures the order of codecs using
org.apache.hadoop.conf.Configuration
, and ensures we use the aircompressor GZIP, LZO, and LZ4 codecs. The structure has also been updated to give us explicit control over mapping customorg.apache.hadoop.io.compress.CompressionCodec
to the appropriateorg.apache.parquet.hadoop.metadata.CompressionCodecName
.Verified #2495 and #2569 still work, as well as LZO compressed parquet files