Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Databricks runtime shim detection #8587

Closed
gerashegalov opened this issue Jun 20, 2023 · 3 comments · Fixed by #11455
Closed

Improve Databricks runtime shim detection #8587

gerashegalov opened this issue Jun 20, 2023 · 3 comments · Fixed by #11455
Assignees
Labels
improve task Work required that improves the product but is not user facing

Comments

@gerashegalov
Copy link
Collaborator

We currently rely on the prefix of the version strings

def matchesVersion(version: String): Boolean = {
SparkEnv.get.conf.get("spark.databricks.clusterUsageTags.sparkVersion", "").startsWith("12.2.")
}

whose values are documented in the spark-versions API. These versions represent wildcards for the latest patch of the major.minor such as 11.3.x.

Thus, a user of an older rapids-4-spark artifact may hit a runtime bug or worse, a silent defect, instead of a clear actionable message as implemented in #8521

Spark UI on DBR displays "Build Properties" in the Environment Tab:

Name  Value
Runtime Build Hash 383fa9ccdbf99891a97ff2c546d4330d923a6d82
Universe Build Hash e3f8b198b7c7c313f95719b7f41d3503780d4a4d

These values correspond to

org.apache.spark.BuildInfo.gitHash
com.databricks.BuildInfo.gitHash

in a Scala notebook

which we can be utilized in the patch version detection.

@gerashegalov gerashegalov added task Work required that improves the product but is not user facing improve labels Jun 20, 2023
@gerashegalov
Copy link
Collaborator Author

We can also improve the reliability of the released spark-rapids jars using the nightly pipeline of the pending release.

We know that semi-monthly/bi-weekly maintenance updates to Databricks Runtimes can break released spark-rapids plugin code.
Usually it is more subtle than just breaking the API #10070 (comment).

Ideally we want to retest the jar version that is already used by customers upon every maintenance update. However, testing is time consuming. So we do not want to retest last N releases nightly. Say we do it on a weekly schedule. And say due to an unfortunate sequencing the test runs just before the DBR update push, it may take another week for the next run to catch new issues.

However, we can utilize the fact that our pending release runs nightly tests on DBR to detect whether we need to kick off released artifacts tests.

We can maintain a table mapping DB buildver to last tested build hashes

DB buildver DBR hashes tested
spark321db
spark330db
spark332db
spark341db

Somewhere in the source code we will have a test or ./integration_tests/run_pyspark_from_build.sh log the current values org.apache.spark.BuildInfo.gitHash, com.databricks.BuildInfo.gitHash

then the CI can compare it to the last known value for the DB shim based on the table and kick off a pipeline for released test jars automatically, then update the table. This should shorten the window of detection to a couple of a days.

@gerashegalov
Copy link
Collaborator Author

gerashegalov commented Jul 16, 2024

Update: The P0 part of this issue is to log details

org.apache.spark.BuildInfo
com.databricks.BuildInfo

and potentially more details as documented for the SQL function select current_version; https://docs.databricks.com/en/sql/language-manual/functions/current_version.html#returns

This should be logged via Databricks shim service providers com.nvidia.spark.rapids.shims.spark3XYdb.SparkShimServiceProvider

and in the CI logs

@pxLi
Copy link
Collaborator

pxLi commented Jul 17, 2024

link to #11184

gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Sep 10, 2024
Fixes NVIDIA#8587

- Match Version from the binaries
- Log build info exposed in current_version

Signed-off-by: Gera Shegalov <gera@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improve task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants