Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6365] Add ZStandard compression support for Java SDK #7416

Closed
wants to merge 1 commit into from

Conversation

jklukas
Copy link
Contributor

@jklukas jklukas commented Jan 4, 2019

Adds a dependency on zstd-jni which declares no transitive dependencies,
but otherwise just fills out enum values to expose the ZStandard support
that already exists in commons-compress.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

Adds a dependency on `zstd-jni` which declares no transitive dependencies,
but otherwise just fills out enum values to expose the ZStandard support
that already exists in commons-compress.
@jklukas
Copy link
Contributor Author

jklukas commented Jan 4, 2019

cc @jkff who originally merged Compression.java.

@jklukas
Copy link
Contributor Author

jklukas commented Jan 4, 2019

Run Python PreCommit

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome, super clean implementation nothing to say 👏

/** @see Compression#ZSTD */
ZSTD(Compression.ZSTD),

/** @see Compression#DEFLATE */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@@ -267,7 +290,7 @@ public void testCompressedAccordingToFilepatternGzip() throws Exception {
verifyReadContents(input, tmpFile, null /* default auto decompression factory */);
}

/** Test reading according to filepattern when the file is gzipped. */
/** Test reading according to filepattern when the file is bzipped. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other good catch good.

@iemejia
Copy link
Member

iemejia commented Jan 9, 2019

This adds a new dep for sdks/java/core. It is a pity that we don't have an extensible mechanism for new compression schemas. Has somebody proposed or discussed this @kennknowles? As usual I am afraid of possible conflict on dependencies.

@iemejia
Copy link
Member

iemejia commented Jan 9, 2019

I will wait for the extra comment but probably will merge it ASAP. Thanks @jklukas !

@jklukas
Copy link
Contributor Author

jklukas commented Jan 9, 2019

We could move zstd-jni to a test dependency and document that users will need to add that dependency on their own. If the lib isn't present, you'll see runtime errors about a class not being found. We could catch that error and output a message that users are expected to provide zstd-jni.

@iemejia
Copy link
Member

iemejia commented Jan 10, 2019

I think that the test idea brings a bit too much complexity. I was thinking more about having something like a Registrar like we do for FileSystems or other service like extensible parts.

@iemejia
Copy link
Member

iemejia commented Jan 11, 2019

Any comment on this one @kennknowles WDYT do we take the extension like approach or we merge as it is? Notice that zstd jar dep uses also JNI.

@kennknowles
Copy link
Member

Is it marked experimental? I think most new features should be, for a while. It might be hard to mark the right pieces of code, given how this fits in, but still seems a good idea.

With that, I think as-is is fine. The implementation and tests are consistent with what we already have. I do think moving to an extension model would be a good improvement, and we can probably do it to all the pieces of the enum at the same time and this will not make it much harder.

@iemejia
Copy link
Member

iemejia commented Jan 14, 2019

@jklukas I created BEAM-6422 to create an extensible mechanism for compression. I will tackle this during the week and ping you when ready so you can rebase and add zstd as the first use case of it.
Sorry for the delay/inconvenience but it is just to have a leaner core.

@jklukas
Copy link
Contributor Author

jklukas commented Jan 14, 2019

I will tackle this during the week and ping you when ready so you can rebase and add zstd as the first use case of it

That sounds great.

@iemejia
Copy link
Member

iemejia commented Feb 8, 2019

@jklukas I have been considering this PR since it does not add a compile time dependency (it 'only' depends on commons-compress) that we should better rely on commons-compress as in the middle mechanism, and make zstd-jni optional at some moment if we find clashes with other versions. So I am going to merge this one.
Thanks a lot for implementing this and sorry if the discussion made the merge take longer.

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will do some minor adjustments and merge manually.

iemejia added a commit that referenced this pull request Feb 8, 2019
@iemejia
Copy link
Member

iemejia commented Feb 8, 2019

Merged !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants