-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ... #1956
Conversation
Lowering the log level hides it, but what's the cause of these issues? |
QA tests have started for PR 1956. This patch merges cleanly. |
This occurs when an executor process shuts down while tasks are executing (e.g. because the driver disassociated or an OOME). Hadoop FileSystems register a shutdown hook to close themselves. RecordReaders get closed in a finally block after the tasks that they're used in. So there's a race between these two and I can't think of a good way to make one execute after the other. I'm a little confused as to why the HadoopRDD finally block is running at all. Some googling seems to indicate that finally blocks don't run during a System.exit(). And I would think a ShutdownHook would run after that happens anyway. So I can't claim to have 100% understanding of what's going on here. Spark isn't closing the FileSystem on its own. More generally, I think logging a warning is overkill on a reader close error. |
Ah and the order they should be shut down in is RecordReader then Thanks for catching this -- I've seen it myself and was wondering why the
|
QA results for PR 1956: |
Right |
@sryza what's the stack trace printed here? I think it would be better to check whether we're shutting down (with Utils.inShutdown) and log a warning if we're not shutting down. A failed close() seems bad in other situations. |
Here's the exception:
Thanks, I wasn't aware of Utils.inShutdown. I'll post a patch that uses that. I haven't yet figured out how to reliably reproduce this, so I can't verify that it will safeguard against the warning in all situations where it should, but it seems like an improvement. |
… a job fails while reading from Hadoop
073da37
to
815813a
Compare
QA tests have started for PR 1956 at commit
|
QA tests have finished for PR 1956 at commit
|
I believe the failure is unrelated. I noticed it on SPARK-2461 as well. |
Thanks Sandy, merged this. |
… a ... ...job fails while reading from Hadoop Author: Sandy Ryza <sandy@cloudera.com> Closes apache#1956 from sryza/sandy-spark-3052 and squashes the following commits: 815813a [Sandy Ryza] SPARK-3052. Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop
...job fails while reading from Hadoop