Skip to content

Commit

Permalink
[SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite…
Browse files Browse the repository at this point in the history
… should verify the downloaded file

## What changes were proposed in this pull request?

This is a backport of #21210 because `branch-2.2` also faces the same failures.

Although [SPARK-22654](https://issues.apache.org/jira/browse/SPARK-22654) made `HiveExternalCatalogVersionsSuite` download from Apache mirrors three times, it has been flaky because it didn't verify the downloaded file. Some Apache mirrors terminate the downloading abnormally, the *corrupted* file shows the following errors.

```
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
22:46:32.700 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer =====

*** RUN ABORTED ***
  java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory
```

This has been reported weirdly in two ways. For example, the above case is reported as Case 2 `no failures`.

- Case 1. [Test Result (1 failure / +1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/)
- Case 2. [Test Result (no failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/)

This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by verifying the downloaded `tgz` file by extracting and checking the existence of `bin/spark-submit`. If it turns out that the file is empty or corrupted, `HiveExternalCatalogVersionsSuite` will do retry logic like the download failure.

## How was this patch tested?

Pass the Jenkins.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #21232 from dongjoon-hyun/SPARK-23489-2.
  • Loading branch information
dongjoon-hyun authored and gatorsmile committed May 4, 2018
1 parent 154bbc9 commit 768d0b7
Showing 1 changed file with 18 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,30 +57,31 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
for (i <- 0 until 3) {
val preferredMirror =
Seq("wget", "https://www.apache.org/dyn/closer.lua?preferred=true", "-q", "-O", "-").!!.trim
val url = s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
val filename = s"spark-$version-bin-hadoop2.7.tgz"
val url = s"$preferredMirror/spark/spark-$version/$filename"
logInfo(s"Downloading Spark $version from $url")
if (Seq("wget", url, "-q", "-P", path).! == 0) {
return
val downloaded = new File(sparkTestingDir, filename).getCanonicalPath
val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath

Seq("mkdir", targetDir).!
val exitCode = Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").!
Seq("rm", downloaded).!

// For a corrupted file, `tar` returns non-zero values. However, we also need to check
// the extracted file because `tar` returns 0 for empty file.
val sparkSubmit = new File(sparkTestingDir, s"spark-$version/bin/spark-submit")
if (exitCode == 0 && sparkSubmit.exists()) {
return
} else {
Seq("rm", "-rf", targetDir).!
}
}
logWarning(s"Failed to download Spark $version from $url")
}
fail(s"Unable to download Spark $version")
}


private def downloadSpark(version: String): Unit = {
tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)

val downloaded = new File(sparkTestingDir, s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath

Seq("mkdir", targetDir).!

Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").!

Seq("rm", downloaded).!
}

private def genDataDir(name: String): String = {
new File(tmpDataDir, name).getCanonicalPath
}
Expand Down Expand Up @@ -125,7 +126,7 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
PROCESS_TABLES.testingVersions.zipWithIndex.foreach { case (version, index) =>
val sparkHome = new File(sparkTestingDir, s"spark-$version")
if (!sparkHome.exists()) {
downloadSpark(version)
tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
}

val args = Seq(
Expand Down

0 comments on commit 768d0b7

Please sign in to comment.