Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2954] update shade configurations in extension/sql #3848

Closed
wants to merge 2 commits into from

Conversation

mingmxu
Copy link

@mingmxu mingmxu commented Sep 13, 2017

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@mingmxu
Copy link
Author

mingmxu commented Sep 13, 2017

R: @aviemzur

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 69.483% when pulling 522e6c1 on XuMingmin:BEAM-2954 into 50532f0 on apache:master.

@aviemzur
Copy link
Member

aviemzur commented Sep 13, 2017

The shading config looks like it should work.

However, we need to make sure we can include the classes from these artifacts as part of our distribution, based on their license.

Calcite is Apache License 2.0 so should be fine
Janino is BSD license https://raw.githubusercontent.com/janino-compiler/janino/master/LICENSE

@davorbonaci

@davorbonaci
Copy link
Member

Thanks @aviemzur.

Some licensing comments:

  • Anything that is included in our jar should be relocated, so there's no classpath collision if some other dependency uses the same thing. On the high level, our jars shouldn't include any classes outside org.apache.beam.<component>, because everything is relocated.
  • There should be a corresponding SurfaceTest to make sure we are not relocating something that is on the API surface, hence making it impossible to relocate (this may already exist -- didn't check).
  • Janino copyright notice should to be copied into our NOTICE with text "Parts of the software are licensed as" phrasing. Calcite is probably fine, unless they have something transitive that may need to be included as well. See documentation on the www.apache.org for exact requirements.

Happy to help -- let me know if you need something else.

@mingmxu
Copy link
Author

mingmxu commented Sep 14, 2017

@davorbonaci thanks for comments.

For #3, could you help with a draft for the NOTICE?

@mingmxu
Copy link
Author

mingmxu commented Sep 14, 2017

Based on all my tests, here’re the yes or no cases:

  1. opt-out shade [YES]
  2. shade without relocation [YES]
  3. shade with relocation [NO]

Note:
a). license concern is not addressed here;
b). For 3). static class name used in code, see org.codehaus.commons.compiler.CompilerFactoryFactory#L56;

Given the choices, I would prefer 1 > 2, no 3;

@aviemzur @davorbonaci any comments?

@aviemzur
Copy link
Member

As mentioned in the ticket we cannot opt-out of shading in this module as we cannot leak Guava to our users.
We should find the way to shade and relocate correctly.

@davorbonaci
Copy link
Member

+1 to shading and relocating

@mingmxu
Copy link
Author

mingmxu commented Sep 15, 2017

Absolutely open to any workable solution. According to my test(I may miss something, and is asking @aviemzur to rerun my test code), shade+relocate fails as some code hard code java class path(like below).

InputStream is         = Thread.currentThread().getContextClassLoader().getResourceAsStream(
                "org.codehaus.commons.compiler.properties"

If none of them are workable/acceptable, the last option would be remove guava in sql module, thoughts?

@mingmxu
Copy link
Author

mingmxu commented Sep 16, 2017

update relocation settings in shade, protobuf is shaded as well.

Also add NOTICE and Janino license for distribution. @davorbonaci @aviemzur please help with a review to make sure we follow the license requirement.

@mingmxu
Copy link
Author

mingmxu commented Sep 17, 2017

retest this please

@mingmxu
Copy link
Author

mingmxu commented Sep 18, 2017

CC @reuvenlax @jbonofre
Any comments on the license info?

@mingmxu
Copy link
Author

mingmxu commented Sep 20, 2017

retest this please

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 69.543% when pulling ed07fe1 on XuMingmin:BEAM-2954 into bd0facc on apache:master.

@jbonofre
Copy link
Member

R: @jbonofre

@mingmxu
Copy link
Author

mingmxu commented Sep 21, 2017

hey guys, any updates?

@reuvenlax
Copy link
Contributor

License looks good to me. @aviemzur @davorbonaci does the shading look ok?

@aviemzur
Copy link
Member

@jbonofre @davorbonaci @reuvenlax re: licenses, looks like Protobuf was added to the shading as well

Suggestion: Instead of adding a org.apache.beam.sdks.java.extensions.sql.repackaged.org.codehaus.commons.compiler.properties file, see if you can instead relocate the original file using shade plugin.

@mingmxu
Copy link
Author

mingmxu commented Sep 25, 2017

@aviemzur, do you mean the settings as below?

<relocation>
  <pattern>org.codehaus</pattern>
  <shadedPattern>
    org.apache.${renderedArtifactId}.repackaged.org.codehaus
  </shadedPattern>
  <excludes>org.codehaus.janino.CompilerFactory</excludes>
</relocation>
<relocation>
  <pattern>org.codehaus.commons.compiler.properties</pattern>
  <shadedPattern>org.apache.beam.sdks.java.extensions.sql.repackaged.org.codehaus.commons.compiler.properties</shadedPattern>
  <rawString>true</rawString>
</relocation>

A org.apache.beam.sdks.java.extensions.sql.repackaged.org.codehaus.commons.compiler.properties file may be better as the shade configuration above doesn't relocate file org.codehaus.janino.CompilerFactory janino.

@reuvenlax
Copy link
Contributor

FYI this PR is one of the last remaining blockers for the 2.2.0 cut, so please let me know if you need any more help!

@mingmxu
Copy link
Author

mingmxu commented Sep 26, 2017

IMO the current shade configuration is good to go. I can make a trick as previous comment to avoid file org.apache.beam.sdks.java.extensions.sql.repackaged.org.codehaus.commons.compiler.properties, but I don't think that's a good option as it introduces non-relocated class files.

@aviemzur let me know if anything I missed?

@kennknowles
Copy link
Member

The configuration looks OK, but I agree that if possible you should use shading rather than add the properties file. Taking a peek at the contents of the jar is helpful.

I am surprised there are no changes to the BeamSqlApiSurfaceTest. That makes me think the test wasn't working.

@kennknowles
Copy link
Member

Taking a look, you probably want a couple more surface tests. The test you have is OK for ensuring that the actual public API surface is still going to work, but it does not ensure that you don't have forbidden residual dependencies.

I actually think it needs to be an IT to operate on the shaded jar, right? And then it should test ApiSurface.ofPackage(...) to ensure that Guava & Calcite do not appear anywhere.

@kennknowles
Copy link
Member

To add some more context:

  • Some API surface tests check the current class's classloader. I think this was an attempt to get transitive dependencies. I don't recall how it works but I worry it might miss things.
  • Testing the whole package is better than a particular class or selection of classes. If you use Guava and Calcite on public methods it is going to fail. That isn't really wrong - you have shaded things on public API. So you need to whitelist/exclude subpackages and classes that are marked @Internal. If this is too hard, then communicating what is the real API to users is also too hard.

@mingmxu
Copy link
Author

mingmxu commented Sep 27, 2017

let's summary the discussions together:

  1. the shade configuration

I intent to include that file. If we want to avoid it, I suppose my previous comment is one option, which would package a non-relocated class file org.codehaus.janino.CompilerFactory; --Please share if other solutions;

  1. add IT test

Is WordCountIT the right example? I think BeamSqlExample can be used. Thoughts?

  1. API surface

The existing test lists the 4 public classes, except org.apache.beam.sdk.extensions.sql.impl which are internal classes regarding to implementation. --I'm not sure whether it can cover the scenarios of shade.
CC @xumingming for more comments

@mingmxu
Copy link
Author

mingmxu commented Sep 28, 2017

FYI, create https://issues.apache.org/jira/browse/BEAM-2998 for 2)

@reuvenlax
Copy link
Contributor

So what's left here? Do we still need more tests?

@mingmxu
Copy link
Author

mingmxu commented Sep 28, 2017

can our CI job run on shade libraries? I take a quick look at job_beam_PostCommit_Java_MavenInstall.groovy and job_beam_PreCommit_Java_MavenInstall.groovy, not support?

If it supports, can anyone take the task or share some notes on how to add a IT test?

@kennknowles
Copy link
Member

I don't understand what is going on with the properties file. It seems like you had a way to make it automated with the shade plugin.

@mingmxu
Copy link
Author

mingmxu commented Sep 28, 2017

while, let me make it clear about the properties file, now we have two choices:

  1. add the properties file; (shown in current PR)
  2. shade the properties file, and do not relocate class org.codehaus.janino.CompilerFactory

I prefer to 1), mostly because I don't want to have a non-relocated class file;

@kennknowles
Copy link
Member

Can we do both?

@mingmxu
Copy link
Author

mingmxu commented Sep 28, 2017

Do you mean the two shade options? I don't think so, they're mutually exclusive.

@kennknowles
Copy link
Member

Yes, I mean to have the identical result in the built jar, but automated.

@mingmxu
Copy link
Author

mingmxu commented Sep 28, 2017

IMO it cannot be automated, as shade plugin doesn't check and update the properties file.

@reuvenlax
Copy link
Contributor

any conclusion here?

@reuvenlax
Copy link
Contributor

Also how critical is this for 2.2.0, as I would like to get a cut out.

@mingmxu
Copy link
Author

mingmxu commented Oct 2, 2017

I would label it critical as I see failures when running queries which have join/aggregation.

(Test environment: Kafka+MySQL+FlinkRunner)

@kennknowles
Copy link
Member

The lack of automation is an issue for maintainability, but doesn't affect users. So we should merge it for the release and file a ticket against 2.3.0 to find a way to automate it.

@kennknowles
Copy link
Member

We need to explore the world of shading configs and perhaps other resource-manipulating maven plugins so that we aren't making funky hacked versions of properties files for our dependencies.

@mingmxu
Copy link
Author

mingmxu commented Oct 2, 2017

Agree with an automate configuration (also how to test shaded libraries), this is a workaround that doesn't impact end users, we could look for other options later.

@reuvenlax
Copy link
Contributor

ok, I'll go ahead and merge. Please file a JIRA for automation.

@aviemzur
Copy link
Member

aviemzur commented Oct 3, 2017

Retest this please.

@asfgit asfgit closed this in baa6ebc Oct 3, 2017
@mingmxu mingmxu deleted the BEAM-2954 branch October 3, 2017 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants