Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the mismatching default configs in integration tests #11283

Merged
merged 41 commits into from
Aug 15, 2024

Conversation

jihoonson
Copy link
Collaborator

Fixes #11134.

Default values of some configs used in the integration test by default has gone stale as they haven't updated when their defaults changed. This is because we have a separated list of default configs used by the integration test, which should be manually updated along with code change. This PR fixes this problem by dumping all default configs in a file and letting the integration test use it. A new interface, getDefault, has been added in ConfEntry. A new DumpDefaultConfigs class uses this API to dump all default configs in a file. In the integration test, those default configs are loaded when the spark session is initialized.

integration_tests/pom.xml Outdated Show resolved Hide resolved
integration_tests/pom.xml Outdated Show resolved Hide resolved
integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved
@@ -124,6 +124,7 @@ abstract class ConfEntry[T](val key: String, val converter: String => T, val doc

def get(conf: Map[String, String]): T
def get(conf: SQLConf): T
def getDefault(): T
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would rather make sense to omit the conf withtout defaults in the dump IMO

Copy link
Collaborator Author

@jihoonson jihoonson Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this API should just return whatever the default is. Any business logic should not be involved at this level.

As for DumpDefaultConfigs, it dumps even those configs without defaults because we would want to unset them for the integration test if they were set somehow.

…a new stage for integration test to populate default configs

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
@jihoonson
Copy link
Collaborator Author

jihoonson commented Aug 1, 2024

@@ -1439,7 +1439,7 @@ This will force full Scala code rebuild in downstream modules.
                 <plugin>
                     <groupId>org.codehaus.mojo</groupId>
                     <artifactId>exec-maven-plugin</artifactId>
-                    <version>3.0.0</version>
+                    <version>3.3.0</version>
                 </plugin>

Uh, the CI fails with this error. What should I do about it? I bumped up this dependency version because the provided classpathScope has been introduced in that version.

@jihoonson
Copy link
Collaborator Author

jihoonson commented Aug 1, 2024

The CI error seems because of the missing version update in the 2.13 pom file for the corresponding dependency. I will fix it soon.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is adding a new dependency to the integration tests, it needs to update integration_tests/src/assembly/bin.xml to add the file to the integration tests tarball so CI test pipelines that do not build from source can run the integration tests properly.

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved
@jihoonson
Copy link
Collaborator Author

jihoonson commented Aug 1, 2024

@jlowe thanks for catching it. I added the new file in 8dbd354, but I'm not sure if I did it right. I opened up the integration_tests/target/rapids-4-spark-integration-tests_2.12-24.08.0-SNAPSHOT-sparkxxx-jar-with-dependencies.jar, but I don't see it there. Should the file be added somewhere else than this jar file? In fact, I don't seem to see any of other files as well in the integration_tests/src/assembly/bin.xml file.

@jihoonson
Copy link
Collaborator Author

java.lang.reflect.InaccessibleObjectException: Unable to make protected void java.net.URLClassLoader.addURL(java.net.URL) accessible: module java.base does not "opens java.net" to unnamed module @6808428c
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:354)
    at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297)
    at java.lang.reflect.Method.checkCanSetAccessible (Method.java:200)
    at java.lang.reflect.Method.setAccessible (Method.java:194)
    at org.apache.commons.lang3.reflect.MethodUtils.invokeMethod (MethodUtils.java:839)
    at org.apache.commons.lang3.reflect.MethodUtils.invokeMethod (MethodUtils.java:802)
    at com.nvidia.spark.rapids.ShimLoader$.$anonfun$updateSparkClassLoader$2 (ShimLoader.scala:165)
    at com.nvidia.spark.rapids.ShimLoader$.$anonfun$updateSparkClassLoader$2$adapted (ShimLoader.scala:161)
    at scala.collection.immutable.List.foreach (List.scala:334)
    at com.nvidia.spark.rapids.ShimLoader$.$anonfun$updateSparkClassLoader$1 (ShimLoader.scala:161)
    at com.nvidia.spark.rapids.ShimLoader$.$anonfun$updateSparkClassLoader$1$adapted (ShimLoader.scala:160)
    at scala.Option.foreach (Option.scala:437)
    at com.nvidia.spark.rapids.ShimLoader$.updateSparkClassLoader (ShimLoader.scala:160)
    at com.nvidia.spark.rapids.ShimLoader$.getShimClassLoader (ShimLoader.scala:186)
    at com.nvidia.spark.rapids.ShimReflectionUtils$.loadClass (ShimReflectionUtils.scala:28)
    at com.nvidia.spark.rapids.tests.DumpDefaultConfigs$.main (DumpDefaultConfigs.scala:50)

Hmm so the CI is now failing with the error above. I think it's because of some missing JVM options, such as --add-opens=java.base/java.net=ALL-UNNAMED for java 17. I'm working on a fix.

integration_tests/pom.xml Outdated Show resolved Hide resolved
integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved
integration_tests/src/assembly/bin.xml Outdated Show resolved Hide resolved
@jihoonson
Copy link
Collaborator Author

As for the CI failure with java 17, to allow access to protected methods via reflection, I have two ideas:

  1. We can add JVM options in MAVEN_OPTS to allow it. This MAVEN_OPTS can be set in the mvn-verify-check.yml so that the github action picks it up. We should update our doc as well to explain that those JVM options should be set to make a build with Java 17.
  2. Instead of letting the maven process run DumpDefaultConfigs, we could spawn a new process by setting the goal of exec-maven-plugin to exec that executes a scala program. This will require making a jar with all the dependencies DumpDefaultConfigs needs, which seems only scopt currently besides the provided rapids-4-spark_2.12 dependency.

I'm leaning towards to the option 1 only because it is easier. But the option 2 doesn't seem that hard either. Any opinions here?

@jlowe
Copy link
Member

jlowe commented Aug 2, 2024

Any opinions here?

We already have an example of running a class as a program with all of its dependencies to generate files during the build which isn't doing any reflection shenanigans and IIUC is working with the JDK17 build. See the update_config_docs rule in dist/pom.xml. Any reason we can't do the same here?

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 2, 2024

+1 for following the pattern from update_config_docs. exec plugin executions are run in the Maven JVM, and requiring particular MAVEN_OPTS is unnecessarily user-unfriendly. In Ant java task https://ant.apache.org/manual/Tasks/java.html we can set fork=true and pass the same args we already use for unit tests ${extraJavaTestArgs} as jvmargs if necessary

@jihoonson
Copy link
Collaborator Author

Build

@jihoonson
Copy link
Collaborator Author

Can someone help to run the CI? I don't seem to have enough power for that.

@gerashegalov
Copy link
Collaborator

Can someone help to run the CI? I don't seem to have enough power for that.

Create a PR to add your GH user here

github.actor == 'SurajAralihalli'

@gerashegalov
Copy link
Collaborator

build

@jihoonson jihoonson added the bug Something isn't working label Aug 8, 2024
@jihoonson
Copy link
Collaborator Author

The CI is failing because some tests in hash_aggregate_test are assuming spark.sql.legacy.allowNegativeScaleOfDecimal to be set to false by default, which is good as this is exactly what we want to catch and fix with this PR 🙂 Though, those tests (test_hash_grpby_sum and test_hash_grpby_sum_full_decimal) seem to intend to test certain cases with decimals with negative scales, so I think we should keep the negative decimal scale enabled for these tests. In c4ba1a2, I added conf = copy_and_update(conf, {'spark.sql.legacy.allowNegativeScaleOfDecimal': 'true'}) to overwrite this property in each test. I'm not quite sure if this is the best way to do so though, please let me know.

jlowe
jlowe previously approved these changes Aug 9, 2024
gerashegalov
gerashegalov previously approved these changes Aug 9, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one caveat:

Just double-checking:
beyond propagating the classpath, part of the reason why we looked into forking out a JVM for config dump execution is to propagate extra options for JDK17. #11283 (comment)

Comment on lines +102 to +111
<target>
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<java classname="com.nvidia.spark.rapids.tests.DumpDefaultConfigs"
failonerror="true"
classpathref="maven.compile.classpath"
fork="true">
<arg value="json"/>
<arg value="${project.build.directory}/spark-rapids-default-configs.json"/>
</java>
</target>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we end up not needing this #11283 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not quite sure why it works. I can add those java options here just in case..

@jihoonson jihoonson dismissed stale reviews from gerashegalov and jlowe via 45beca3 August 9, 2024 20:39
@jihoonson
Copy link
Collaborator Author

It is necessary to set fork=true, if the class being launched by this task or any libraries being used by that class, call APIs like java.lang.System.exit() or java.lang.Runtime.exit().

I have noticed that the RapidsConf task in dist/pom.xml should set fork=true as well per https://ant.apache.org/manual/Tasks/java.html since now it can call System.exit(). Fixed this in 45beca3.

@jihoonson
Copy link
Collaborator Author

I have noticed that the RapidsConf task in dist/pom.xml should set fork=true as well per https://ant.apache.org/manual/Tasks/java.html since now it can call System.exit(). Fixed this in 45beca3.

NVM. Java 8 seems to fail to have some classpath issue with this change. It could be nice to fix, but I don't think the change that adds System.exit() in RapidsConf.main() is necessary for this PR, so I just reverted it.

@jihoonson
Copy link
Collaborator Author

build

@jihoonson
Copy link
Collaborator Author

build

@jihoonson
Copy link
Collaborator Author

Hi @jlowe, I have fixed all tests and the CI is green now. Could you have another look?

@jihoonson
Copy link
Collaborator Author

build

@jihoonson jihoonson merged commit bc8c577 into NVIDIA:branch-24.10 Aug 15, 2024
43 checks passed
jihoonson added a commit to jihoonson/spark-rapids that referenced this pull request Aug 16, 2024
…DIA#11283)"

This reverts commit bc8c577.

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
zpuller pushed a commit that referenced this pull request Aug 17, 2024
)" (#11347)

This reverts commit bc8c577.

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Integration tests run with default configs that don't match RapidsConf defaults
3 participants