-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4281][Build] Package Yarn shuffle service into its own jar #3147
Conversation
This allows make-distribution to create a small uber jar for the network-yarn module, such that all uses of the Yarn shuffle service can just drop this jar onto the NM classpath and start the shuffle service after configuring the NM to include it.
Test build #23037 has started for PR 3147 at commit
|
I have tested the changes in both maven and SBT. |
Test build #23037 has finished for PR 3147 at commit
|
Test PASSed. |
@@ -41,12 +41,12 @@ | |||
<groupId>io.netty</groupId> | |||
<artifactId>netty-all</artifactId> | |||
</dependency> | |||
|
|||
<!-- Provided dependencies --> | |||
<dependency> | |||
<groupId>org.slf4j</groupId> | |||
<artifactId>slf4j-api</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move the comment up here? Should this be at the provided scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually Yarn already provides slf4j so it doesn't need to be a core dependency. For standalone mode, this is also already required by Spark so it doesn't need to be a core dependency there either. HOWEVER I just realized I forgot to actually make it provided by adding the tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added the tag
Test build #23056 has started for PR 3147 at commit
|
Test build #23056 has finished for PR 3147 at commit
|
Test PASSed. |
<artifactId>maven-shade-plugin</artifactId> | ||
<configuration> | ||
<shadedArtifactAttached>false</shadedArtifactAttached> | ||
<outputFile>${project.build.directory}/scala-${scala.binary.version}/spark-network-yarn-${project.version}-hadoop${hadoop.version}.jar</outputFile> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we name this something like spark-yarn-shuffle-hadoop${hadoop.version}
? The current name is very generic and, unlike our internal build, this will be user-facing since some folks might need to actually copy this jar into a location for YARN. It might be good to make it very obvious what this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - if this is going to be compatible across multiple YARN versions, maybe we should actually just put the Spark version instead: spark-${project.version}-yarn-shuffle
.
Hey Andrew - this looks good. I added some comment, all were regarding how we name the produced jar. |
66a7868
to
bda58d0
Compare
Test build #23191 has started for PR 3147 at commit
|
Test build #23191 has finished for PR 3147 at commit
|
Test PASSed. |
LGTM |
Ok, I merge into master and 1.2 |
Okay, you merge |
This is another addendum to #3082, which added the Yarn shuffle service to run inside the NM. This PR makes the feature much more usable by packaging enough dependencies into the jar to run the service inside an NM. After these changes, the user can run `./make-distribution.sh` and find a `spark-network-yarn*.jar` in their `lib` directory. The equivalent change is done in SBT by making the `network-yarn` module an assembly project. Author: Andrew Or <andrew@databricks.com> Closes #3147 from andrewor14/yarn-shuffle-build and squashes the following commits: bda58d0 [Andrew Or] Fix line too long 81e9705 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-build fb7f398 [Andrew Or] Rename jar to spark-{VERSION}-yarn-shuffle.jar 65db822 [Andrew Or] Actually mark slf4j as provided abcefd1 [Andrew Or] Do the same for SBT c653028 [Andrew Or] Package network-yarn and its dependencies (cherry picked from commit aa43a8d) Signed-off-by: Andrew Or <andrew@databricks.com>
I ran the
|
I had that as well when I did provide -Pyarn to |
Good catch. The jar won't be there unless |
Alright the hot fix is merged. Thanks for reporting. |
This is introduced in #3147 and is failing builds without the `-Pyarn` profile. Author: Andrew Or <andrew@databricks.com> Closes #3250 from andrewor14/fix-yarn-shuffle-build and squashes the following commits: 42b3d37 [Andrew Or] Do not fail fast if Yarn shuffle jar does not exist (cherry picked from commit a0fa1ba) Signed-off-by: Andrew Or <andrew@databricks.com>
This is another addendum to #3082, which added the Yarn shuffle service to run inside the NM. This PR makes the feature much more usable by packaging enough dependencies into the jar to run the service inside an NM. After these changes, the user can run
./make-distribution.sh
and find aspark-network-yarn*.jar
in theirlib
directory. The equivalent change is done in SBT by making thenetwork-yarn
module an assembly project.