Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect multiple jars on the classpath when init plugin [databricks] #9654

Merged
merged 35 commits into from
Nov 28, 2023

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Nov 7, 2023

Closes #7309

This PR detects if there are multiple plugin/cudf/jni jars on the classpath when initializing RapidsDriverPlugin and RapidsExecutorPlugin and throws an exception if so. This PR also added a config to allow this behavior.

Note that this PR did not accommodate for the fact that we might get a duplication spread over user and system classloaders. So one in a system classloader via extraClassPath or because it's in $SPARK_HOME/jars, and another one via spark.jars. I will try to check this or file a follow-on issue on this.

Note that the exception will only be thrown if the first plugin jar on the classpath is after this PR, so the same issue may occur again in the short term.

Tests

$ spark-shell --master local[*] --jars $(find ~/spark-rapids/dist/ -name '*spark*.jar' | xargs printf "%s,") --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.allowMultipleJars=true

23/11/15 15:41:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/11/15 15:41:05 WARN RapidsPluginUtils: RAPIDS Accelerator 23.12.0-SNAPSHOT using cudf 23.12.0-SNAPSHOT.
23/11/15 15:41:05 WARN RapidsPluginUtils: Multiple rapids-4-spark jars with different revision found in the classpath:
revison: ded091e8777a137b41964c6aad4953421f8ca7ad
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=ded091e8777a137b41964c6aad4953421f8ca7ad
branch=detect_multiple_jars
date=2023-11-09T14:00:37Z
url=https://github.com/thirtiseven/spark-rapids.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=ded091e8777a137b41964c6aad4953421f8ca7ad
branch=detect_multiple_jars
date=2023-11-09T14:00:37Z
url=https://github.com/thirtiseven/spark-rapids.git
revison: 3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
branch=detect_multiple_jars
date=2023-11-15T07:31:16Z
url=https://github.com/thirtiseven/spark-rapids.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/deps/rapids-4-spark-aggregator_2.12-23.12.0-SNAPSHOT-spark341.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
branch=detect_multiple_jars
date=2023-11-15T07:31:16Z
url=https://github.com/thirtiseven/spark-rapids.git

Please make sure there is only one rapids-4-spark jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
23/11/15 15:41:05 WARN RapidsPluginUtils: Multiple spark-rapids-jni jars with different revision found in the classpath:
revison: a87a4039372b0b1e0bca596866c5cedbe1e0a845
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
user=
revision=a87a4039372b0b1e0bca596866c5cedbe1e0a845
branch=float_to_string
date=2023-11-09T01:13:08Z
url=https://github.com/thirtiseven/spark-rapids-jni.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
user=
revision=a87a4039372b0b1e0bca596866c5cedbe1e0a845
branch=float_to_string
date=2023-11-09T01:13:08Z
url=https://github.com/thirtiseven/spark-rapids-jni.git
revison: 9685ff68547b98f81c9b245478bcec2cdfb237e1
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
user=
revision=9685ff68547b98f81c9b245478bcec2cdfb237e1
branch=HEAD
date=2023-11-14T05:13:43Z
url=https://github.com/NVIDIA/spark-rapids-jni.git

Please make sure there is only one spark-rapids-jni jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
23/11/15 15:41:05 WARN RapidsPluginUtils: Multiple cudf jars with different revision found in the classpath:
revison: b3e32795bbcb575c526364e77edee3a44aa1ff3f
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
user=
revision=b3e32795bbcb575c526364e77edee3a44aa1ff3f
branch=HEAD
date=2023-11-09T01:13:08Z
url=https://github.com/rapidsai/cudf.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
user=
revision=b3e32795bbcb575c526364e77edee3a44aa1ff3f
branch=HEAD
date=2023-11-09T01:13:08Z
url=https://github.com/rapidsai/cudf.git
revison: 4313cfa9b3fcff41f67b48ac8797dc015d441ecc
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
user=
revision=4313cfa9b3fcff41f67b48ac8797dc015d441ecc
branch=HEAD
date=2023-11-14T05:13:43Z
url=https://github.com/rapidsai/cudf.git

Please make sure there is only one cudf jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
23/11/15 15:41:05 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.
23/11/15 15:41:05 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
23/11/15 15:41:11 WARN RapidsPluginUtils: Multiple rapids-4-spark jars with different revision found in the classpath:
revison: ded091e8777a137b41964c6aad4953421f8ca7ad
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=ded091e8777a137b41964c6aad4953421f8ca7ad
branch=detect_multiple_jars
date=2023-11-09T14:00:37Z
url=https://github.com/thirtiseven/spark-rapids.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=ded091e8777a137b41964c6aad4953421f8ca7ad
branch=detect_multiple_jars
date=2023-11-09T14:00:37Z
url=https://github.com/thirtiseven/spark-rapids.git
revison: 3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
branch=detect_multiple_jars
date=2023-11-15T07:31:16Z
url=https://github.com/thirtiseven/spark-rapids.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/deps/rapids-4-spark-aggregator_2.12-23.12.0-SNAPSHOT-spark341.jar
version=23.12.0-SNAPSHOT
cudf_version=23.12.0-SNAPSHOT
user=haoyangl
revision=3dc12e49ffb194eff7dcc0d00cde3d81f17d01e1
branch=detect_multiple_jars
date=2023-11-15T07:31:16Z
url=https://github.com/thirtiseven/spark-rapids.git

Please make sure there is only one rapids-4-spark jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
23/11/15 15:41:11 WARN RapidsPluginUtils: Multiple spark-rapids-jni jars with different revision found in the classpath:
revison: a87a4039372b0b1e0bca596866c5cedbe1e0a845
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
user=
revision=a87a4039372b0b1e0bca596866c5cedbe1e0a845
branch=float_to_string
date=2023-11-09T01:13:08Z
url=https://github.com/thirtiseven/spark-rapids-jni.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
user=
revision=a87a4039372b0b1e0bca596866c5cedbe1e0a845
branch=float_to_string
date=2023-11-09T01:13:08Z
url=https://github.com/thirtiseven/spark-rapids-jni.git
revison: 9685ff68547b98f81c9b245478bcec2cdfb237e1
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
user=
revision=9685ff68547b98f81c9b245478bcec2cdfb237e1
branch=HEAD
date=2023-11-14T05:13:43Z
url=https://github.com/NVIDIA/spark-rapids-jni.git

Please make sure there is only one spark-rapids-jni jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
23/11/15 15:41:11 WARN RapidsPluginUtils: Multiple cudf jars with different revision found in the classpath:
revison: b3e32795bbcb575c526364e77edee3a44aa1ff3f
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda113.jar
version=23.12.0-SNAPSHOT
user=
revision=b3e32795bbcb575c526364e77edee3a44aa1ff3f
branch=HEAD
date=2023-11-09T01:13:08Z
url=https://github.com/rapidsai/cudf.git
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda112.jar
version=23.12.0-SNAPSHOT
user=
revision=b3e32795bbcb575c526364e77edee3a44aa1ff3f
branch=HEAD
date=2023-11-09T01:13:08Z
url=https://github.com/rapidsai/cudf.git
revison: 4313cfa9b3fcff41f67b48ac8797dc015d441ecc
jar URL: jar:file:/home/haoyangl/spark-rapids/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
version=23.12.0-SNAPSHOT
user=
revision=4313cfa9b3fcff41f67b48ac8797dc015d441ecc
branch=HEAD
date=2023-11-14T05:13:43Z
url=https://github.com/rapidsai/cudf.git

Please make sure there is only one cudf jar in the classpath. If it is impossible to fix the classpath you can suppress the error by setting spark.rapids.sql.allowMultipleJars to true.
Spark context Web UI available at http://dev-machine:4040
Spark context available as 'sc' (master = local[*], app id = local-1700034065117).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.1
      /_/

Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_382)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Nov 7, 2023
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
gerashegalov
gerashegalov previously approved these changes Nov 9, 2023
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gerashegalov gerashegalov changed the title Detect multiple jars on the classpath when init plugin Detect multiple jars on the classpath when init plugin [databricks] Nov 9, 2023
@gerashegalov
Copy link
Collaborator

build

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

build

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
gerashegalov
gerashegalov previously approved these changes Nov 9, 2023
@thirtiseven
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator

The integration tests fail because we add rapids4spark-version-info.properties to all intermediate jars and the final jars including rapids-4-spark-integration-tests jar which we have not published since 21.12

We can deal with whether we should continue adding version info to the intermediate jars separately.

For a fix:

  • we could check for public classes instead such as "com/nvidia/spark/SQLPlugin.class"
  • or filter out URLs containing "test"

I lean to the second option

@jlowe
Copy link
Member

jlowe commented Nov 9, 2023

We publish the integration tests jar internally to aid in nightly test runs. Another approach is to make the properties file in the dist artifact unique relative to intermediate jars. Arguably all of the property files should be unique, even across intermediates, since we sometimes decide to publish those intermediates, even if only internally.

@gerashegalov
Copy link
Collaborator

Another approach is to make the properties file in the dist artifact unique relative to intermediate jars. Arguably all of the property files should be unique, even across intermediates, since we sometimes decide to publish those intermediates, even if only internally.

This sounds good to me. As for the version file generation it is easy to interpolate ${project.artifactId} into the file name. This will slightly complicate the logic, and require code adjustments in the detection logic and otherwise.

@gerashegalov gerashegalov self-requested a review November 9, 2023 18:40
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

build

Comment on lines 145 to 156
conf.allowMultipleJars match {
case "ALWAYS" =>
logWarning(msg)
case "SAME_REVISION" =>
require(revisionMap.size == 1, msg)
case "NEVER" =>
require(revisionMap.size == 1 && revisionMap.values.forall(_.size == 1), msg)
case _ =>
throw new IllegalArgumentException(s"Invalid value for " +
s"${RapidsConf.ALLOW_MULTIPLE_JARS.key}: ${conf.allowMultipleJars}. " +
s"Valid values are ALWAYS, SAME_REVISION, NEVER.")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us follow the pattern of other configs with an enum. Take a look PARQUET_READER_FOOTER_TYPE and make an auxiliary function translating strings <-> enum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"values are ALWAYS: allow all jars, SAME_REVISION: only allow jars with the same " +
"revision, NEVER: do not allow multiple jars at all.")
.stringConf
.createWithDefault("SAME_REVISION")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think NEVER should be the default, and SAME_REVISION should only be set in the tests where it is hard to fix right away with an issue filed to get it fixed later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will file a follow-on issue on it.

What are the possible issues if there are multiple jars of the same revision on the classpath?

Copy link
Collaborator

@gerashegalov gerashegalov Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some problems going undetected:

It is fine to have a separate issue to sort the right default value out.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
gerashegalov
gerashegalov previously approved these changes Nov 23, 2023
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
val revisionMap: Map[String, Seq[URL]] = possibleRapidsJarURLs.map { url =>
val versionInfo = scala.io.Source.fromURL(url).getLines().toSeq
lazy val revision = versionInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no point in lazy

Suggested change
lazy val revision = versionInfo
val revision = versionInfo

Comment on lines 124 to 125
.filter(_.startsWith("revision="))
.map(_.split("=").last)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you define a regex

val revisionRegex = "revision=(.*)".r

you could do it in one shot

Suggested change
.filter(_.startsWith("revision="))
.map(_.split("=").last)
.collect {
case revisionRegex(revision) => revision
}

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the minimum, retest the functionality manually. Bonus points: add unit tests

NEVER fails for me with a single jar

SPARK_HOME=~/dist/spark-3.3.0-bin-hadoop3 \
  ~/dist/spark-3.3.0-bin-hadoop3/bin/spark-shell \
  --jars dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.allowMultipleJars=NEVER

java.lang.IllegalArgumentException: requirement failed: Multiple rapids-4-spark jars found in the classpath:
revison: f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
        version=23.12.0-SNAPSHOT
        cudf_version=23.12.0-SNAPSHOT
        user=gshegalov
        revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        branch=pr/thirtiseven/9654
        date=2023-11-24T03:03:31Z
        url=https://github.com/NVIDIA/spark-rapids
        jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
        version=23.12.0-SNAPSHOT
        cudf_version=23.12.0-SNAPSHOT
        user=gshegalov
        revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        branch=pr/thirtiseven/9654
        date=2023-11-24T03:03:31Z
        url=https://github.com/NVIDIA/spark-rapids

@thirtiseven
Copy link
Collaborator Author

At the minimum, retest the functionality manually. Bonus points: add unit tests

NEVER fails for me with a single jar

SPARK_HOME=~/dist/spark-3.3.0-bin-hadoop3 \
  ~/dist/spark-3.3.0-bin-hadoop3/bin/spark-shell \
  --jars dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.allowMultipleJars=NEVER

java.lang.IllegalArgumentException: requirement failed: Multiple rapids-4-spark jars found in the classpath:
revison: f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
        version=23.12.0-SNAPSHOT
        cudf_version=23.12.0-SNAPSHOT
        user=gshegalov
        revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        branch=pr/thirtiseven/9654
        date=2023-11-24T03:03:31Z
        url=https://github.com/NVIDIA/spark-rapids
        jar URL: jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar
        version=23.12.0-SNAPSHOT
        cudf_version=23.12.0-SNAPSHOT
        user=gshegalov
        revision=f5bd0c958c1afc86a3e7906684809fc62e21d8ab
        branch=pr/thirtiseven/9654
        date=2023-11-24T03:03:31Z
        url=https://github.com/NVIDIA/spark-rapids

It's weird, I can't reproduce this with 330/341 and with or without upmerge.

@gerashegalov
Copy link
Collaborator

gerashegalov commented Nov 24, 2023

It's weird, I can't reproduce this with 330/341 and with or without upmerge.

@thirtiseven it looks like a single shim jar does not reproduce the issue indeed.

my reproducer involves a two-shim jar built with:

./build/buildall --profile=330,350

on your PR branch

the problem is related to the fact that we are currently sloppy with the inclusion of the properties files

7d11a4ff004e53dac97819a387007b2497e2819e -> Vector(
  jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar!/rapids4spark-version-info.properties, 
  jar:file:/home/gshegalov/gits/NVIDIA/spark-rapids.worktrees/reviews/dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar!/spark330/rapids4spark-version-info.properties
)

Let us consider only the files stored under !/, and ignore subdirs for simplicity

@thirtiseven thirtiseven changed the base branch from branch-23.12 to branch-24.02 November 27, 2023 04:26
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

@gerashegalov Thank you! I can reproduce now.
Just updated the code to filter out subdirs' files and submodules and tested again. Trying to add UT

Comment on lines 118 to 124
val possibleRapidsJarURLs = classloader.getResources(propName).asScala.toSet.toSeq.filter {
url => {
val urlPath = url.toString
// filter out submodule jars and files stored under subdirs of '!/'
!urlPath.contains("rapids-4-spark-") && urlPath.contains("!/" + propName)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too easy to follow but I get it now. In the comment can you add an example of the submodule vs main dist artifact you are looking for.

However, now it looks like we NEVER will never find out where we put submodule jars for tests by accident?

Copy link
Collaborator Author

@thirtiseven thirtiseven Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, now it looks like we NEVER will never find out where we put submodule jars for tests by accident?

How about adding another config mode like RECURISIVE to detect all jars on classpath and print a warning log? Just need to skip this filter under that mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by RECURSIVE?

if RECURSIVE means going down a directory level inside a jar file, I think we already established that we should only look at the root entries
ending with "!/" + propName

if RECURSIVE means including the check to all sub-module artitfact id prefixed with rapids-4-spark- I realize that we do not have a good story now that change of the version info file name to contain the artifact Id is not part of the PR. Let us defer this to the followup issue you were going to file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if RECURSIVE means including the check to all sub-module artitfact id prefixed with rapids-4-spark- I realize that we do not have a good story now that change of the version info file name to contain the artifact Id is not part of the PR. Let us defer this to the followup issue you were going to file.

Yes, I mean this way. I will add it to the follow-up issue too.

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working through the review issues.

I believe we have a very useful MVP already, and it can be improved in the follow-up PRs.

@thirtiseven
Copy link
Collaborator Author

Thanks for your patient review and useful suggestions @gerashegalov

@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven thirtiseven merged commit 6f96048 into NVIDIA:branch-24.02 Nov 28, 2023
37 checks passed
@thirtiseven
Copy link
Collaborator Author

Follow on issue: #9870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Detect multiple versions of the RAPIDS jar on the classpath at the same time
6 participants