-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip unshim and dedup of external spark-rapids-jni and jucx [databricks] #5603
Conversation
Signed-off-by: Chong Gao <res_life@163.com>
build |
build |
build |
Should target branch-22.08 |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been tested with the -DallowConventionalDistJar
feature added in #5528?
aggregator/pom.xml
Outdated
@@ -55,6 +55,18 @@ | |||
<version>${project.version}</version> | |||
<classifier>${spark.version.classifier}</classifier> | |||
</dependency> | |||
<!-- set provided scope for JNI and ucx to avoid pulling into aggregator jar --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot find a way to automatically pull in immediate provided dependencies, there needs to be a comment here explaining that any changes here likely impact the corresponding dependency list maintained in dist/pom.xml.
Tested, and it works.
Also tested
|
aggregator/pom.xml
Outdated
@@ -55,6 +55,18 @@ | |||
<version>${project.version}</version> | |||
<classifier>${spark.version.classifier}</classifier> | |||
</dependency> | |||
<!-- set provided scope for JNI and ucx to avoid pulling into aggregator jar --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could simply add these artifacts to the exclude list of the shade artifactSet
<artifactSet>
<excludes>
<exclude>org.slf4j:*</exclude>
<exclude>com.nvidia:spark-rapids-jni:*</exclude>
<exclude>org.openucx:jucx:*</exclude>
</excludes>
</artifactSet>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this one is more natural. Done.
<goals> | ||
<goal>unpack</goal> | ||
</goals> | ||
<configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have a more obvious way of reasoning about the order of the dependency processing I would move this unpacks inside ant build-parallel-worlds.xml
to copy-dependencies
target either before or after the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not found a way to invoke the unpack
goal of maven-dependency-plugin
in the maven-antrun-plugin
.
Just coping the jars in the local ~/.m2
path is not so good, because we can specify the local repository path when executing mvn
commands. The maven-dependency-plugin
is responsible to copy the right jars.
Now the unpack
works and the order is right, let's keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be done similar to the dependency:get code there.
+ <target name="unpack-ucx-and-jni">
+ <resources id="unpackArray">
+ <string>com.nvidia:spark-rapids-jni:${project.version}:jar:${cuda.version}</string>
+ <string>org.openucx:jucx:${ucx.version}</string>
+ </resources>
+ <pathconvert property="unpackList" refid="unpackArray" pathsep=","/>
+ <ac:for list="${unpackList}" param="unpackArtifact" trim="true">
+ <sequential>
+ <echo level="info">dependency:unpack @{unpackArtifact}</echo>
+ <exec executable="${maven.home}/bin/mvn" failonerror="true">
+ <arg value="dependency:unpack"/>
+ <arg value="-s" if:set="env.URM_URL"/>
+ <arg value="${project.basedir}/../jenkins/settings.xml" if:set="env.URM_URL"/>
+ <arg value="-B"/>
+ <arg value="-Dmaven.repo.local=${maven.repo.local}" if:set="maven.repo.local"/>
+ <arg value="-Dartifact=@{unpackArtifact}"/>
+ <arg value="-DoutputDirectory=${project.build.directory}/parallel-world"/>
+ </exec>
+ </sequential>
+ </ac:for>
+ </target>
Your suggestion is fine with me, however, I am biased to keeping related code in one place even if it's a little hacky.
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The base branch was changed.
build |
1 similar comment
build |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Re-target 22.08 |
…ks] (NVIDIA#5603) * Unshim spark-rapids-jni and ucx * Manage versions in the dependency management section * Use the shade plugin to exclude external jars in the aggregator submodule Signed-off-by: Chong Gao <res_life@163.com>
Closes #5294
Changes:
jucx
andspark-rapids-jni
into every aggregator jar by marking them asprovided
dependency scope. Now the aggregator jar is less than 10M.dist
jar after the deduping aggregator jar is done.Note:
This PR supports building against specified spark-rapids-jni snapshot jar by specify
'-Dspark-rapids-jni.version=xx'
Signed-off-by: Chong Gao res_life@163.com