Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23390][SQL] Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7 #20584

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:

  1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
  2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

val df = spark.read.format(format).load(
new Path(basePath, "first").toString,
new Path(basePath, "second").toString,
new Path(basePath, "third").toString)

val fs = thirdPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
// Make sure all data files are deleted and can't be opened.
files.foreach(f => fs.delete(f, false))
assert(fs.delete(thirdPath, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm .. but it asserts true on the delete completion .. I would be surprised if it's something not guaranteed ..

@HyukjinKwon
Copy link
Member

BTW, my rough wild guess was that case 2. (reading it but not closing it) happens in schema inference path.

@gatorsmile
Copy link
Member

gatorsmile commented Feb 12, 2018

LGTM, I would merge this first if Jenkins does not fail and see whether this can help fix the flaky tests.

@HyukjinKwon
Copy link
Member

I won't get in the way but I am less sure on this. I thought this is also flaky in PR builder too anyway.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Feb 12, 2018

BTW, my rough wild guess was that case 2. (reading it but not closing it) happens in schema inference path.

According to the log, the leaked file stream was created when building the ORC columnar reader. Schema inference is fine.

@HyukjinKwon
Copy link
Member

You are right. I have run out of ideas. LGTM too for a try if it happens more frequently in spark-branch-2.3-test-sbt-hadoop-2.7.

@sameeragarwal
Copy link
Member

LGTM, seems plausible!

@kiszk
Copy link
Member

kiszk commented Feb 12, 2018

I am also thinking about this. I agree with this.

According to the log, the leaked file stream was created when building the ORC columnar reader.

I am suspicious about relationship between afterEach() and addTaskCompletionListener (call close()). But, not sure. Let us try this approach first.

@SparkQA
Copy link

SparkQA commented Feb 12, 2018

Test build #87321 has finished for PR 20584 at commit 51bb48a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

merging this to master/2.3. Thanks!

asfgit pushed a commit that referenced this pull request Feb 12, 2018
…k 2.3/hadoop 2.7

## What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:
1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #20584 from cloud-fan/flaky-test.

(cherry picked from commit 6efd5d1)
Signed-off-by: Sameer Agarwal <sameerag@apache.org>
@asfgit asfgit closed this in 6efd5d1 Feb 12, 2018
@cloud-fan
Copy link
Contributor Author

I am suspicious about relationship between afterEach() and addTaskCompletionListener (call close()). But, not sure. Let us try this approach first.

This is one of my speculations. There 2 possibilities I can think of: 1) the task completion listener is not called before afterEach. 2) the orc columnar reader's close method doesn't close the file stream.

For 1), seems we've fixed it in c5a31d1 . For 2), I'm not sure and may need help from ORC folks.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 12, 2018

I think I rushed to take a look Initially. Thanks for fixing this.

robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 12, 2018
…k 2.3/hadoop 2.7

## What changes were proposed in this pull request?

This test only fails with sbt on Hadoop 2.7, I can't reproduce it locally, but here is my speculation by looking at the code:
1. FileSystem.delete doesn't delete the directory entirely, somehow we can still open the file as a 0-length empty file.(just speculation)
2. ORC intentionally allow empty files, and the reader fails during reading without closing the file stream.

This PR improves the test to make sure all files are deleted and can't be opened.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#20584 from cloud-fan/flaky-test.
@gatorsmile
Copy link
Member

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 12, 2018

My bad. Thank you, guys. For the following, I'll investigate it more.

According to the log, the leaked file stream was created when building the ORC columnar reader.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 12, 2018

This patch helps branch-2.3/sbt/hadoop2.7. So, I'm seriously monitoring the latest consecutive failures at sbt and hadoop-2.6 at master branch, too.

  • 4210 Passed!
  • 4209 Failed with FileBasedDataSourceSuite and ParquetQuerySuite
  • 4208 This patch landed here but failed with StreamingOuterJoinSuite and ReceiverSuite.
  • 4207 Failed with ParquetQuerySuite
  • 4206 Failed with BufferHolderSparkSubmitSuite

@dongjoon-hyun
Copy link
Member

For the following case, I'll make a PR for Spark ORC columnar reader very soon.

  1. the orc columnar reader's close method doesn't close the file stream.

@dongjoon-hyun
Copy link
Member

I created a PR, #20590 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants