Use aircompressor codecs #3575

devinrsmith · 2023-03-20T17:40:36Z

Updates to use the aircompressor ZSTD codec, which does not suffer from the same off-heap leak that our own implementation does. Fixes #3470

Additionally, this PR more explicitly configures the order of codecs using org.apache.hadoop.conf.Configuration, and ensures we use the aircompressor GZIP, LZO, and LZ4 codecs. The structure has also been updated to give us explicit control over mapping custom org.apache.hadoop.io.compress.CompressionCodec to the appropriate org.apache.parquet.hadoop.metadata.CompressionCodecName.

Verified #2495 and #2569 still work, as well as LZO compressed parquet files

Fixes deephaven#3470 Verified deephaven#2495 and deephaven#2569 still work, as well as LZO compressed parquet files

devinrsmith · 2023-03-23T19:24:55Z

Nightlies all pass.

niloc132

Looks good from here - please verify on arm64.

@abaranec do you have time to take a look at this too?

devinrsmith · 2023-03-31T14:37:54Z

I'll get these all run on M1 before merge.

devinrsmith · 2023-03-31T15:52:57Z

I've verified on M1 that the new tests pass, and have also verified reading large parquet files in GZIP, LZO, LZ4, SNAPPY, and ZSTD.

Update to use aircompressor ZstdCodec and LzopCodec

6b39f1f

Fixes deephaven#3470 Verified deephaven#2495 and deephaven#2569 still work, as well as LZO compressed parquet files

devinrsmith added parquet Related to the Parquet integration NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Mar 20, 2023

devinrsmith added this to the Mar 2023 milestone Mar 20, 2023

devinrsmith self-assigned this Mar 20, 2023

Configure codecs

f3aa033

devinrsmith changed the title ~~Update to use aircompressor ZstdCodec and LzopCodec~~ Update to use aircompressor ZstdCodec Mar 21, 2023

devinrsmith mentioned this pull request Mar 21, 2023

Parquet compression followup #2920

Closed

3 tasks

devinrsmith added 3 commits March 22, 2023 11:09

Merge remote-tracking branch 'upstream/main' into nightly/fix-zstd-leak

1c1a01a

Parquet file tests

02d8d1a

Update tickets

55e46ad

devinrsmith changed the title ~~Update to use aircompressor ZstdCodec~~ Use aircompressor codecs Mar 23, 2023

devinrsmith added the version-bump label Mar 23, 2023

devinrsmith marked this pull request as ready for review March 23, 2023 19:24

devinrsmith requested review from chipkent, jmao-denver and rcaudy as code owners March 23, 2023 19:24

devinrsmith requested a review from niloc132 March 23, 2023 19:33

null name check

20adab5

niloc132 requested a review from abaranec March 23, 2023 19:59

chipkent approved these changes Mar 24, 2023

View reviewed changes

niloc132 approved these changes Mar 30, 2023

View reviewed changes

abaranec approved these changes Mar 31, 2023

View reviewed changes

devinrsmith merged commit 2d93e99 into deephaven:main Mar 31, 2023

devinrsmith deleted the nightly/fix-zstd-leak branch March 31, 2023 15:53

github-actions bot locked and limited conversation to collaborators Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use aircompressor codecs #3575

Use aircompressor codecs #3575

devinrsmith commented Mar 20, 2023 •

edited

Loading

devinrsmith commented Mar 23, 2023

niloc132 left a comment

devinrsmith commented Mar 31, 2023

devinrsmith commented Mar 31, 2023

Use aircompressor codecs #3575

Use aircompressor codecs #3575

Conversation

devinrsmith commented Mar 20, 2023 • edited Loading

devinrsmith commented Mar 23, 2023

niloc132 left a comment

Choose a reason for hiding this comment

devinrsmith commented Mar 31, 2023

devinrsmith commented Mar 31, 2023

devinrsmith commented Mar 20, 2023 •

edited

Loading