Skip to content

Commit

Permalink
Add Spark 4.0.0 Build Profile and Other Supporting Changes [databrick…
Browse files Browse the repository at this point in the history
…s] (#10994)

* POM changes for Spark 4.0.0

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* validate buildver and scala versions

* more pom changes

* fixed the scala-2.12 comment

* more fixes for scala-2.13 pom

* addressed comments

* add in shim check to account for 400

* add 400 for premerge tests against jdk 17

* temporarily remove 400 from snapshotScala213

* fixed 2.13 pom

* Remove 400 from jdk17 as it will compile with Scala 2.12

* github workflow changes

* added quotes to pom-directory

* update version defs to include scala 213 jdk 17

* Cross-compile all shims from JDK17 to JDK8

Eliminate Logging inheritance to prevent shimming of unshimmable API
classes

Signed-off-by: Gera Shegalov <gera@apache.org>

* dummy

* undo api pom change

Signed-off-by: Gera Shegalov <gera@apache.org>

* Add preview1 to the allowed shim versions

Signed-off-by: Gera Shegalov <gera@apache.org>

* Scala 2.13 to require JDK17

Signed-off-by: Gera Shegalov <gera@apache.org>

* Removed unused import left over from razajafri#3

* Setup JAVA_HOME before caching

* Only upgrade the Scala plugin for Scala 2.13

* Regenerate Scala 2.13 poms

* Remove 330 from JDK17 builds for Scala 2.12

* Revert "Remove 330 from JDK17 builds for Scala 2.12"

This reverts commit 1faabd4.

* Downgrade scala.plugin.version for cloudera

* Updated comment to include the issue

* Upgrading the scala.maven.plugin version to 4.9.1 which is the same as Spark 4.0.0

* Downgrade scala-maven-plugin for Cloudera

* revert mvn verify changes

* Avoid cache for JDK 17

* removed cache dep from scala 213
* Added Scala 2.13 specific checks

* Handle the change for UnaryPositive now extending RuntimeReplaceable

* Removing 330 from jdk17.buildvers as we only support Scala2.13 and fixing the enviornment variable in version-defs.sh that we read for building against JDK17 with Scala 213

* Update Scala 2.13 poms

* fixed scala2.13 verify to actually use the scala2.13/pom.xml

* Added missing csv files

* Skip Opcode tests

There is a bytecode incompatibility which is why we are skipping these
until we add support for it. For details please see the following two
issues
#11174
#10203

* upmerged and fixed the new compile error introduced

* addressed review comments

* Removed jdk17 cloudera check and moved it inside the 321,330 and 332 cloudera profiles

* fixed upmerge conflicts

* reverted renaming of id

* Fixed HiveGenericUDFShim

* addressed review comments

* reverted the debugging code

* generated Scala 2.13 poms

---------

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Gera Shegalov <gera@apache.org>
Co-authored-by: Gera Shegalov <gera@apache.org>
  • Loading branch information
razajafri and gerashegalov committed Jul 16, 2024
1 parent 156ee51 commit a41a616
Show file tree
Hide file tree
Showing 34 changed files with 1,641 additions and 140 deletions.
126 changes: 90 additions & 36 deletions .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,9 @@ jobs:
runs-on: ubuntu-latest
outputs:
dailyCacheKey: ${{ steps.generateCacheKey.outputs.dailyCacheKey }}
defaultSparkVersion: ${{ steps.allShimVersionsStep.outputs.defaultSparkVersion }}
sparkTailVersions: ${{ steps.allShimVersionsStep.outputs.tailVersions }}
sparkJDKVersions: ${{ steps.allShimVersionsStep.outputs.jdkVersions }}
scala213Versions: ${{ steps.allShimVersionsStep.outputs.scala213Versions }}
defaultSparkVersion: ${{ steps.all212ShimVersionsStep.outputs.defaultSparkVersion }}
sparkTailVersions: ${{ steps.all212ShimVersionsStep.outputs.tailVersions }}
sparkJDKVersions: ${{ steps.all212ShimVersionsStep.outputs.jdkVersions }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge
- uses: actions/setup-java@v4
Expand All @@ -69,7 +68,7 @@ jobs:
set -x
max_retry=3; delay=30; i=1
while true; do
for pom in pom.xml scala2.13/pom.xml
for pom in pom.xml
do
mvn ${{ env.COMMON_MVN_FLAGS }} --file $pom help:evaluate -pl dist \
-Dexpression=included_buildvers \
Expand All @@ -89,7 +88,7 @@ jobs:
}
done
- name: all shim versions
id: allShimVersionsStep
id: all212ShimVersionsStep
run: |
set -x
. jenkins/version-def.sh
Expand All @@ -113,30 +112,12 @@ jobs:
jdkHeadVersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":8}" "${SPARK_BASE_SHIM_VERSION}")
# jdk11
jdk11VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":11}" "${SPARK_SHIM_VERSIONS_JDK11[@]}")
# jdk17
jdk17VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":17}" "${SPARK_SHIM_VERSIONS_JDK17[@]}")
# jdk
jdkVersionArrBody=$jdkHeadVersionArrBody$jdk11VersionArrBody$jdk17VersionArrBody
jdkVersionArrBody=$jdkHeadVersionArrBody$jdk11VersionArrBody
jdkVersionArrBody=${jdkVersionArrBody:1}
jdkVersionJsonStr=$(printf {\"include\":[%s]} $jdkVersionArrBody)
echo "jdkVersions=$jdkVersionJsonStr" >> $GITHUB_OUTPUT
SCALA_BINARY_VER=2.13
. jenkins/version-def.sh
svArrBodyNoSnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":false}" "${SPARK_SHIM_VERSIONS_NOSNAPSHOTS[@]}")
svArrBodyNoSnapshot=${svArrBodyNoSnapshot:1}
# get private artifact version
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
# do not add empty snapshot versions or when private version is released one (does not include snapshot shims)
if [[ ${#SPARK_SHIM_VERSIONS_SNAPSHOTS_ONLY[@]} -gt 0 && $privateVer == *"-SNAPSHOT" ]]; then
svArrBodySnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":true}" "${SPARK_SHIM_VERSIONS_SNAPSHOTS_ONLY[@]}")
svArrBodySnapshot=${svArrBodySnapshot:1}
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot,$svArrBodySnapshot)
else
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot)
fi
echo "scala213Versions=$svJsonStr" >> $GITHUB_OUTPUT
package-tests:
needs: cache-dependencies
Expand Down Expand Up @@ -187,27 +168,51 @@ jobs:
}
done
set-scala213-versions:
runs-on: ubuntu-latest
outputs:
scala213Versions: ${{ steps.all213ShimVersionsStep.outputs.scala213Versions }}
sparkJDK17Versions: ${{ steps.all213ShimVersionsStep.outputs.jdkVersions }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- id: all213ShimVersionsStep
run: |
set -x
SCALA_BINARY_VER=2.13
. jenkins/version-def.sh
svArrBodyNoSnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":false}" "${SPARK_SHIM_VERSIONS_NOSNAPSHOTS[@]}")
svArrBodyNoSnapshot=${svArrBodyNoSnapshot:1}
# get private artifact version
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot)
echo "scala213Versions=$svJsonStr" >> $GITHUB_OUTPUT
# jdk17
jdk17VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":17}" "${SPARK_SHIM_VERSIONS_JDK17_SCALA213[@]}")
jdkVersionArrBody=$jdk17VersionArrBody
jdkVersionArrBody=${jdkVersionArrBody:1}
jdkVersionJsonStr=$(printf {\"include\":[%s]} $jdkVersionArrBody)
echo "jdkVersions=$jdkVersionJsonStr" >> $GITHUB_OUTPUT
package-tests-scala213:
needs: cache-dependencies
needs: set-scala213-versions
continue-on-error: ${{ matrix.isSnapshot }}
strategy:
matrix: ${{ fromJSON(needs.cache-dependencies.outputs.scala213Versions) }}
matrix: ${{ fromJSON(needs.set-scala213-versions.outputs.scala213Versions) }}
fail-fast: false
runs-on: ubuntu-latest
steps:

- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- name: Setup Java and Maven Env
uses: actions/setup-java@v4
with:
distribution: adopt
java-version: 8

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2
key: ${{ needs.cache-dependencies.outputs.dailyCacheKey }}
java-version: 17

- name: check runtime before tests
run: |
Expand All @@ -218,7 +223,7 @@ jobs:
run: |
# https://github.com/NVIDIA/spark-rapids/issues/8847
# specify expected versions
export JAVA_HOME=${JAVA_HOME_8_X64}
export JAVA_HOME=${JAVA_HOME_17_X64}
export PATH=${JAVA_HOME}/bin:${PATH}
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"
# verify Scala 2.13 build files
Expand Down Expand Up @@ -246,8 +251,57 @@ jobs:
}
done
verify-213-modules:
needs: set-scala213-versions
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJSON(needs.set-scala213-versions.outputs.sparkJDK17Versions) }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- name: Setup Java and Maven Env
uses: actions/setup-java@v4
with:
distribution: adopt
java-version: 17

- name: check runtime before tests
run: |
env | grep JAVA
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"
- name: Build JDK
run: |
# https://github.com/NVIDIA/spark-rapids/issues/8847
# specify expected versions
export JAVA_HOME=${JAVA_HOME_${{ matrix.java-version }}_X64}
export PATH=${JAVA_HOME}/bin:${PATH}
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"
# verify Scala 2.13 build files
./build/make-scala-version-build-files.sh 2.13
# verify git status
if [ -n "$(echo -n $(git status -s | grep 'scala2.13'))" ]; then
git add -N scala2.13/* && git diff 'scala2.13/*'
echo "Generated Scala 2.13 build files don't match what's in repository"
exit 1
fi
# change to Scala 2.13 Directory
cd scala2.13
# test command, will retry for 3 times if failed.
max_retry=3; delay=30; i=1
while true; do
mvn verify \
-P "individual,pre-merge" -Dbuildver=${{ matrix.spark-version }} \
${{ env.COMMON_MVN_FLAGS }} && break || {
if [[ $i -le $max_retry ]]; then
echo "mvn command failed. Retry $i/$max_retry."; ((i++)); sleep $delay; ((delay=delay*2))
else
echo "mvn command failed. Exit 1"; exit 1
fi
}
done
verify-all-modules:
verify-all-212-modules:
needs: cache-dependencies
runs-on: ubuntu-latest
strategy:
Expand Down
19 changes: 19 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -715,5 +715,24 @@
</dependency>
</dependencies>
</profile>
<!-- #if scala-2.13 --><!--
<profile>
<id>release400</id>
<activation>
<property>
<name>buildver</name>
<value>400</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-delta-stub_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
</dependency>
</dependencies>
</profile>
--><!-- #endif scala-2.13 -->
</profiles>
</project>
3 changes: 1 addition & 2 deletions build/buildall
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,6 @@ if [[ "$DIST_PROFILE" == *Scala213 ]]; then
SCALA213=1
fi


# include options to mvn command
export MVN="mvn -Dmaven.wagon.http.retryHandler.count=3 ${MVN_OPT}"

Expand Down Expand Up @@ -196,7 +195,7 @@ case $DIST_PROFILE in
SPARK_SHIM_VERSIONS=($(versionsFromDistProfile "minimumFeatureVersionMix"))
;;

3*)
[34]*)
<<< $DIST_PROFILE IFS="," read -ra SPARK_SHIM_VERSIONS
INCLUDED_BUILDVERS_OPT="-Dincluded_buildvers=$DIST_PROFILE"
unset DIST_PROFILE
Expand Down
4 changes: 3 additions & 1 deletion build/shimplify.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ def __csv_as_arr(str_val):
__dirs_to_derive_shims = sorted(__csv_ant_prop_as_arr('shimplify.dirs'))

__all_shims_arr = sorted(__csv_ant_prop_as_arr('all.buildvers'))
__allScala213_shims_arr = sorted(__csv_ant_prop_as_arr('allScala213.buildvers'))

__log = logging.getLogger('shimplify')
__log.setLevel(logging.DEBUG if __should_trace else logging.INFO)
Expand Down Expand Up @@ -372,7 +373,8 @@ def __generate_symlinks():

def __map_version_array(shim_json_string):
shim_ver = str(json.loads(shim_json_string).get('spark'))
assert shim_ver in __all_shims_arr, "all.buildvers in pom.xml does not contain %s" % shim_ver
assert shim_ver in __all_shims_arr or shim_ver in __allScala213_shims_arr, "all.buildvers or " \
"allScala213.buildvers in pom.xml does not contain %s" % shim_ver
return shim_ver

def __traverse_source_tree_of_all_shims(src_type, func):
Expand Down
8 changes: 8 additions & 0 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@
</included_buildvers>
</properties>
</profile>
<profile>
<id>jdk17-scala213-test</id>
<properties>
<included_buildvers>
${jdk17.scala213.buildvers}
</included_buildvers>
</properties>
</profile>
<profile>
<id>jdk17-test</id>
<properties>
Expand Down
10 changes: 5 additions & 5 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,12 +168,12 @@ function verify_same_sha_for_unshimmed() {
# TODO currently RapidsShuffleManager is "removed" from /spark* by construction in
# dist pom.xml via ant. We could delegate this logic to this script
# and make both simmpler
if [[ ! "$class_file_quoted" =~ (com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class|org/apache/spark/sql/rapids/shims/spark[34].*/ProxyRapidsShuffleInternalManager.class) ]]; then
if [[ ! "$class_file_quoted" =~ com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class ]]; then

if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK_SHARED_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
exit 255
fi
if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK_SHARED_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
exit 255
fi
fi
}

Expand Down
42 changes: 35 additions & 7 deletions jdk-profiles/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,45 @@
<version>24.08.0-SNAPSHOT</version>
<profiles>
<profile>
<id>jdk9plus</id>
<properties>
<scala.plugin.version>4.6.1</scala.plugin.version>
<maven.compiler.source>${java.specification.version}</maven.compiler.source>
<maven.compiler.release>${maven.compiler.source}</maven.compiler.release>
<maven.compiler.target>${maven.compiler.source}</maven.compiler.target>
</properties>
<id>jdk8</id>
<activation>
<jdk>8</jdk>
</activation>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala.plugin.version}</version>
<configuration>
<target>${java.major.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</profile>
<profile>
<id>jdk9plus</id>
<activation>
<!-- activate for all java versions after 9 -->
<jdk>[9,)</jdk>
</activation>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala.plugin.version}</version>
<configuration>
<release>${java.major.version}</release>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</profile>
</profiles>
</project>
3 changes: 3 additions & 0 deletions jenkins/version-def.sh
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,9 @@ SPARK_SHIM_VERSIONS_JDK11=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# jdk17 cases
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pjdk17-test
SPARK_SHIM_VERSIONS_JDK17=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# jdk17 scala213 cases
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pjdk17-scala213-test
SPARK_SHIM_VERSIONS_JDK17_SCALA213=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# databricks shims
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pdatabricks
SPARK_SHIM_VERSIONS_DATABRICKS=("${SPARK_SHIM_VERSIONS_ARR[@]}")
Expand Down
Loading

0 comments on commit a41a616

Please sign in to comment.