Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fast dist assembly no longer possible due to artifacts missing #11428

Open
gerashegalov opened this issue Sep 5, 2024 · 1 comment
Open
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Sep 5, 2024

Describe the bug

The support for building the dist module alone without having to invoke lengthy (re)builds of all aggregated shims is no longer working.

Moreover it means one cannot easily locally the multi-shim assembly build that includes Databricks shims who are not buildable locally at all and require multiple DBR build envs 11.3, 13.3 etc

According to @NvTimLiu it might be related to #11301

Steps/Code to reproduce bug

URM_URL=https://maven.repo.internal  mvn -pl dist clean package -PnoSnapshotsWithDatabricks -Dignore.shim.revisions.check=true -s jenkins/settings.xml

Error:

10:59:40,434 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: The following error occurred while executing this line:
10:59:40,435 [ERROR] /home/user/gits/NVIDIA/spark-rapids/dist/maven-antrun/build-parallel-worlds.xml:80: Traceback (most recent call last):
10:59:40,436 [ERROR]   File "<script>", line 73, in <module>
10:59:40,437 [ERROR]   File "<script>", line 26, in shell_exec
10:59:40,437 [ERROR]    at org.apache.tools.ant.taskdefs.optional.script.ScriptDefBase.fail(ScriptDefBase.java:129)
10:59:40,437 [ERROR]    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:59:40,438 [ERROR]    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:59:40,438 [ERROR]    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:59:40,438 [ERROR]    at java.lang.reflect.Method.invoke(Method.java:498)
10:59:40,438 [ERROR] org.apache.tools.ant.BuildException: failed to execute [u'/usr/share/maven/bin/mvn', 'org.apache.maven.plugins:maven-dependency-plugin:2.10:get', '-B', u'-Ddest=/home/user/gits/NVIDIA/spark-rapids/dist/target/deps', '-DgroupId=com.nvidia', u'-DartifactId=rapids-4-spark-sql-plugin-api_2.12', u'-Dversion=24.10.0-SNAPSHOT', '-Dpackaging=jar', u'-Dclassifier=spark352', '-Dtransitive=false', '-s', u'/home/user/gits/NVIDIA/spark-rapids/jenkins/settings.xml']
10:59:40,439 [ERROR] 
10:59:40,439 [ERROR] around Ant part ...<ant antfile="${spark.rapids.source.basedir}/dist/maven-antrun/build-parallel-worlds.xml" target="remove-dependencies-from-pom" />... @ 9:135 in /home/user/gits/NVIDIA/spark-rapids/dist/target/antrun/build-main.xml
10:59:40,440 [ERROR] -> [Help 1]

Expected behavior
Should work

Environment details (please complete the following information)
local

Additional context
#11301 , #11308

@gerashegalov gerashegalov added ? - Needs Triage Need team to review and classify bug Something isn't working labels Sep 5, 2024
@mattahrens mattahrens added build Related to CI / CD or cleanly building and removed ? - Needs Triage Need team to review and classify labels Sep 10, 2024
@gerashegalov
Copy link
Collaborator Author

gerashegalov commented Sep 10, 2024

Looking at the nightly SNAPSHOT rapids-4-spark_2.12-24.10.0-20240910.120441-33.jar it looks to correctly contain all Databricks shims

$ jar tvf ~/Downloads/rapids-4-spark_2.12-24.10.0-20240910.120441-33.jar | grep 'spark3.*db.*SparkShimService' 
    66 Tue Sep 10 11:59:38 PDT 2024 spark330db/META-INF/services/com.nvidia.spark.rapids.SparkShimServiceProvider
    66 Tue Sep 10 11:59:50 PDT 2024 spark332db/META-INF/services/com.nvidia.spark.rapids.SparkShimServiceProvider
    66 Tue Sep 10 12:00:02 PDT 2024 spark341db/META-INF/services/com.nvidia.spark.rapids.SparkShimServiceProvider
   986 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark330db/SparkShimServiceProvider$.class
  2119 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark330db/SparkShimServiceProvider.class
   986 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark332db/SparkShimServiceProvider$.class
  2119 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark332db/SparkShimServiceProvider.class
   986 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark341db/SparkShimServiceProvider$.class
  2119 Tue Sep 10 12:01:14 PDT 2024 spark-shared/com/nvidia/spark/rapids/shims/spark341db/SparkShimServiceProvider.class

@pxLi @NvTimLiu can you please chime in what is right way of reproducing the dist assembly with databricks shims on the dev machine after #11301

looks like specifying the explicit list of buildvers succeeds without running into the missing dependencies issue

mvn -pl dist clean package -Dincluded_buildvers=350,330db,332db,341db -Ddist.jar.compress=false -Dignore.shim.revisions.check=true -s jenkins/settings.xml 

We can use this issue to fix the WithDatabricks profiles to behave the same way as the explicit buildver list or remove those profiles and
update the doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
Development

No branches or pull requests

2 participants