Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use aircompressor codecs #3575

Merged
merged 6 commits into from
Mar 31, 2023
Merged

Conversation

devinrsmith
Copy link
Member

@devinrsmith devinrsmith commented Mar 20, 2023

Updates to use the aircompressor ZSTD codec, which does not suffer from the same off-heap leak that our own implementation does. Fixes #3470

Additionally, this PR more explicitly configures the order of codecs using org.apache.hadoop.conf.Configuration, and ensures we use the aircompressor GZIP, LZO, and LZ4 codecs. The structure has also been updated to give us explicit control over mapping custom org.apache.hadoop.io.compress.CompressionCodec to the appropriate org.apache.parquet.hadoop.metadata.CompressionCodecName.

Verified #2495 and #2569 still work, as well as LZO compressed parquet files

Fixes deephaven#3470

Verified deephaven#2495 and deephaven#2569 still work, as well as LZO compressed parquet files
@devinrsmith devinrsmith added parquet Related to the Parquet integration NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Mar 20, 2023
@devinrsmith devinrsmith added this to the Mar 2023 milestone Mar 20, 2023
@devinrsmith devinrsmith self-assigned this Mar 20, 2023
@devinrsmith devinrsmith changed the title Update to use aircompressor ZstdCodec and LzopCodec Update to use aircompressor ZstdCodec Mar 21, 2023
@devinrsmith devinrsmith mentioned this pull request Mar 21, 2023
3 tasks
@devinrsmith devinrsmith changed the title Update to use aircompressor ZstdCodec Use aircompressor codecs Mar 23, 2023
@devinrsmith devinrsmith marked this pull request as ready for review March 23, 2023 19:24
@devinrsmith
Copy link
Member Author

Nightlies all pass.

@niloc132 niloc132 requested a review from abaranec March 23, 2023 19:59
Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from here - please verify on arm64.

@abaranec do you have time to take a look at this too?

@devinrsmith
Copy link
Member Author

I'll get these all run on M1 before merge.

@devinrsmith
Copy link
Member Author

I've verified on M1 that the new tests pass, and have also verified reading large parquet files in GZIP, LZO, LZ4, SNAPPY, and ZSTD.

@devinrsmith devinrsmith merged commit 2d93e99 into deephaven:main Mar 31, 2023
@devinrsmith devinrsmith deleted the nightly/fix-zstd-leak branch March 31, 2023 15:53
@github-actions github-actions bot locked and limited conversation to collaborators Mar 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. parquet Related to the Parquet integration version-bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet zstd off-heap memory leak
4 participants