-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31849][PYTHON][SQL] Make PySpark SQL exceptions more Pythonic #28661
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,8 +18,19 @@ | |
import py4j | ||
import sys | ||
|
||
from pyspark import SparkContext | ||
|
||
if sys.version_info.major >= 3: | ||
unicode = str | ||
# Disable exception chaining (PEP 3134) in captured exceptions | ||
# in order to hide JVM stacktace. | ||
exec(""" | ||
def raise_from(e): | ||
raise e from None | ||
""") | ||
else: | ||
def raise_from(e): | ||
raise e | ||
|
||
Comment on lines
+32
to
34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So seems PEP 3134 is only for 3.0+, we don't cut exception chaining in Python 2.7 with this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re: #28661 (comment) too. Yeah. In Python 2, there is no chaining. This is kind of a new feature in Python 3. e.g.) in the current master: >>> sql("a")
Python 3: >>> sql("a")
|
||
|
||
class CapturedException(Exception): | ||
|
@@ -29,7 +40,11 @@ def __init__(self, desc, stackTrace, cause=None): | |
self.cause = convert_exception(cause) if cause is not None else None | ||
|
||
def __str__(self): | ||
sql_conf = SparkContext._jvm.org.apache.spark.sql.internal.SQLConf.get() | ||
debug_enabled = sql_conf.pysparkJVMStacktraceEnabled() | ||
desc = self.desc | ||
if debug_enabled: | ||
desc = desc + "\nJVM stacktrace:\n%s" % self.stackTrace | ||
# encode unicode instance for python2 for human readable description | ||
if sys.version_info.major < 3 and isinstance(desc, unicode): | ||
return str(desc.encode('utf-8')) | ||
|
@@ -67,6 +82,12 @@ class QueryExecutionException(CapturedException): | |
""" | ||
|
||
|
||
class PythonException(CapturedException): | ||
""" | ||
Exceptions thrown from Python workers. | ||
""" | ||
|
||
|
||
class UnknownException(CapturedException): | ||
""" | ||
None of the above exceptions. | ||
|
@@ -75,21 +96,33 @@ class UnknownException(CapturedException): | |
|
||
def convert_exception(e): | ||
s = e.toString() | ||
stackTrace = '\n\t at '.join(map(lambda x: x.toString(), e.getStackTrace())) | ||
c = e.getCause() | ||
|
||
jvm = SparkContext._jvm | ||
jwriter = jvm.java.io.StringWriter() | ||
e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) | ||
stacktrace = jwriter.toString() | ||
Comment on lines
+101
to
+104
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure about this part, why need the change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previous stacktrace wasn't actually quite correct. It hid the stacktrace from executor side before. Now, this PR handles an exception from executor so I needed to change this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, isn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems different. This is what I get from
this is what I get from
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like |
||
if s.startswith('org.apache.spark.sql.AnalysisException: '): | ||
return AnalysisException(s.split(': ', 1)[1], stackTrace, c) | ||
return AnalysisException(s.split(': ', 1)[1], stacktrace, c) | ||
if s.startswith('org.apache.spark.sql.catalyst.analysis'): | ||
return AnalysisException(s.split(': ', 1)[1], stackTrace, c) | ||
return AnalysisException(s.split(': ', 1)[1], stacktrace, c) | ||
if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '): | ||
return ParseException(s.split(': ', 1)[1], stackTrace, c) | ||
return ParseException(s.split(': ', 1)[1], stacktrace, c) | ||
Comment on lines
109
to
+110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some exceptions, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think
If developers want to debug, they can enable |
||
if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '): | ||
return StreamingQueryException(s.split(': ', 1)[1], stackTrace, c) | ||
return StreamingQueryException(s.split(': ', 1)[1], stacktrace, c) | ||
if s.startswith('org.apache.spark.sql.execution.QueryExecutionException: '): | ||
return QueryExecutionException(s.split(': ', 1)[1], stackTrace, c) | ||
return QueryExecutionException(s.split(': ', 1)[1], stacktrace, c) | ||
if s.startswith('java.lang.IllegalArgumentException: '): | ||
return IllegalArgumentException(s.split(': ', 1)[1], stackTrace, c) | ||
return UnknownException(s, stackTrace, c) | ||
return IllegalArgumentException(s.split(': ', 1)[1], stacktrace, c) | ||
if c is not None and ( | ||
c.toString().startswith('org.apache.spark.api.python.PythonException: ') | ||
# To make sure this only catches Python UDFs. | ||
and any(map(lambda v: "org.apache.spark.sql.execution.python" in v.toString(), | ||
c.getStackTrace()))): | ||
msg = ("\n An exception was thrown from Python worker in the executor. " | ||
"The below is the Python worker stacktrace.\n%s" % c.getMessage()) | ||
return PythonException(msg, stacktrace) | ||
return UnknownException(s, stacktrace, c) | ||
|
||
|
||
def capture_sql_exception(f): | ||
|
@@ -99,7 +132,9 @@ def deco(*a, **kw): | |
except py4j.protocol.Py4JJavaError as e: | ||
converted = convert_exception(e.java_exception) | ||
if not isinstance(converted, UnknownException): | ||
raise converted | ||
# Hide where the exception came from that shows a non-Pythonic | ||
# JVM exception message. | ||
raise_from(converted) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So |
||
else: | ||
raise | ||
return deco | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1784,6 +1784,15 @@ object SQLConf { | |
.version("3.0.0") | ||
.fallbackConf(ARROW_EXECUTION_ENABLED) | ||
|
||
val PYSPARK_JVM_STACKTRACE_ENABLED = | ||
buildConf("spark.sql.pyspark.jvmStacktrace.enabled") | ||
.doc("When true, it shows the JVM stacktrace in the user-facing PySpark exception " + | ||
"together with Python stacktrace. By default, it is disabled and hides JVM stacktrace " + | ||
"and shows a Python-friendly exception only.") | ||
.version("3.0.0") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this targeting 3.0.0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can be arguable .. but it virtually changes the exception message only at the core. I personally think it's okay/good to have it in 3.0. But I am okay to retarget if there's any concern about it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile . Are you okay with 3.0.0 targeting here? One question for @HyukjinKwon . Do we want this as a dynamic configuration instead of static configuration? I mean do we want to switch on/off during runtime? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't strictly have to be in Spark 3.0. I just wanted to have some feedback quicker from users, and thought it's good to try this in Spark 3.0 as technically we just touch the error messages only here. I don't super strongly feel it has to land to Spark 3.0 - it's okay to retarget 3.1 if anyone feels strongly it has to be only in the master. Let me know :-) |
||
.booleanConf | ||
.createWithDefault(false) | ||
|
||
val ARROW_SPARKR_EXECUTION_ENABLED = | ||
buildConf("spark.sql.execution.arrow.sparkr.enabled") | ||
.doc("When true, make use of Apache Arrow for columnar data transfers in SparkR. " + | ||
|
@@ -3063,6 +3072,8 @@ class SQLConf extends Serializable with Logging { | |
|
||
def arrowPySparkEnabled: Boolean = getConf(ARROW_PYSPARK_EXECUTION_ENABLED) | ||
|
||
def pysparkJVMStacktraceEnabled: Boolean = getConf(PYSPARK_JVM_STACKTRACE_ENABLED) | ||
|
||
def arrowSparkREnabled: Boolean = getConf(ARROW_SPARKR_EXECUTION_ENABLED) | ||
|
||
def arrowPySparkFallbackEnabled: Boolean = getConf(ARROW_PYSPARK_FALLBACK_ENABLED) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way, actually I mimicked
six