Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace #49535

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

wengh
Copy link

@wengh wengh commented Jan 17, 2025

What changes were proposed in this pull request?

This PR adds new configuration spark.sql.execution.pyspark.udf.hideTraceback.enabled. If set, when handling an exception from Python UDF, only the exception class and message are included. The configuration is turned off by default.
This PR also adds a new optional parameter hide_traceback for handle_udf_exception to override the configuration.

Suggested review order:

  1. python/pyspark/util.py: logic changes
  2. python/pyspark/tests/test_util.py: unit tests
  3. other files: adding new configuration

Why are the changes needed?

This allows library provided UDFs to show only the relevant message without unnecessary stack trace.

Does this PR introduce any user-facing change?

If the configuration is turned off, no user change.
Otherwise, the stack trace is not included in the error message when handling an exception from Python UDF.

Example that illustrates the difference
from pyspark.errors.exceptions.base import PySparkRuntimeError
from pyspark.sql.types import IntegerType, StructField, StructType
from pyspark.sql.udtf import AnalyzeArgument, AnalyzeResult
from pyspark.sql.functions import udtf


@udtf()
class PythonUDTF:
    @staticmethod
    def analyze(x: AnalyzeArgument) -> AnalyzeResult:
        raise PySparkRuntimeError("[XXX] My PySpark runtime error.")

    def eval(self, x: int):
        yield (x,)


spark.udtf.register("my_udtf", PythonUDTF)
spark.sql("select * from my_udtf(1)").show()

With configuration turned off, the last line gives:

...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_VALUED_FUNCTION_FAILED_TO_ANALYZE_IN_PYTHON] Failed to analyze the Python user defined table function: Traceback (most recent call last):
  File "<stdin>", line 7, in analyze
pyspark.errors.exceptions.base.PySparkRuntimeError: [XXX] My PySpark runtime error. SQLSTATE: 38000; line 1 pos 14

With configuration turned on, the last line gives:

...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_VALUED_FUNCTION_FAILED_TO_ANALYZE_IN_PYTHON] Failed to analyze the Python user defined table function: pyspark.errors.exceptions.base.PySparkRuntimeError: [XXX] My PySpark runtime error. SQLSTATE: 38000; line 1 pos 14

How was this patch tested?

Added unit test in python/pyspark/tests/test_util.py, testing two cases with the configuration turned on and off respectively.

Was this patch authored or co-authored using generative AI tooling?

No

@wengh wengh force-pushed the spark-50858-hide-udf-stack-trace branch from c46ecf5 to 174fdea Compare January 17, 2025 00:17
@HyukjinKwon
Copy link
Member

I think we should show the UDF exception. Otherwise users won't know what's going on and why their job fails.

@allisonwang-db
Copy link
Contributor

allisonwang-db commented Jan 17, 2025

@HyukjinKwon I agree, and this PR is just to make it configurable (with the default value set to false - show stack trace by default). There are many user-friendly errors on the Python side, but they are often buried in a long Python-side stack trace. This change is intended to optionally hide these stack traces to improve the user experience.

@wengh wengh marked this pull request as ready for review January 17, 2025 18:40
@wengh
Copy link
Author

wengh commented Jan 21, 2025

@allisonwang-db @ueshin Could you review this PR that adds configuration to hide Python stack trace from analyze_udtf?

@@ -122,6 +122,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
protected val authSocketTimeout = conf.get(PYTHON_AUTH_SOCKET_TIMEOUT)
private val reuseWorker = conf.get(PYTHON_WORKER_REUSE)
protected val faultHandlerEnabled: Boolean = conf.get(PYTHON_WORKER_FAULTHANLDER_ENABLED)
protected val hideTraceback: Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use conf.get(PYSPARK_HIDE_TRACEBACK) here so that we don't need to override every subclass?

Copy link
Author

@wengh wengh Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config is defined in org.apache.spark.sql.internal.SQLConf which seems to be inaccessible from here. For reference, PYSPARK_SIMPLIFIED_TRACEBACK is also defined in SQLConf so BasePythonRunner subclasses have to override it.

Is there an advantage for putting it in SQLConf rather than e.g. org.apache.spark.internal.config.Python?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf in SQLConf is session-based conf that also can be set in runtime, and any conf in core module or StaticSQLConf is cluster-wide conf and can't be changed while the cluster is running.

python/pyspark/util.py Outdated Show resolved Hide resolved
"hiding the stack trace.")
.version("4.0.0")
.booleanConf
.createWithDefault(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way is to create this conf as an int, and show the max depth of stacktrace but I don't feel strongly.

Copy link
Author

@wengh wengh Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case where we only want to show only last k frames of the stack? I'm under the impression that we want to show full stack trace for most exceptions, and completely hide stack trace for specific library exceptions when the message is sufficient to identify the reason.

@wengh wengh requested a review from allisonwang-db January 23, 2025 01:22
@wengh wengh force-pushed the spark-50858-hide-udf-stack-trace branch from c412da6 to c4f40b0 Compare January 23, 2025 18:03
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show the example of the error messages, between this is enabled and disabled, in the PR description?

@@ -462,22 +462,40 @@ def wrapped(*args: Any, **kwargs: Any) -> Any:
return f # type: ignore[return-value]


def handle_worker_exception(e: BaseException, outfile: IO) -> None:
def handle_worker_exception(
e: BaseException, outfile: IO, hide_traceback: Optional[bool] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass hide_traceback?

Copy link
Author

@wengh wengh Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional, and uses the value of SPARK_HIDE_TRACEBACK by default (see docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any place that passes this parameter except for the tests.
I just thought test_env_full and test_env_hide_traceback are enough without taking it here?
If we want to take this, we need more tests, like "setting env but pass it" will ignore the env var.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more tests for the override behaviour

Comment on lines +473 to +481
Parameters
----------
e : BaseException
Exception handled
outfile : IO
IO object to write the exception info
hide_traceback : bool, optional
Whether to hide the traceback in the output.
By default, hides the traceback if environment variable SPARK_HIDE_TRACEBACK is set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the parameters here!

Comment on lines 3480 to 3482
.doc(
"When true, only show the message of the exception from Python UDFs, " +
"hiding the stack trace.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to describe a bit more about the relationship between this and simplifiedTraceback, here or in the doc of simplifiedTraceback.
Seems like if this is enabled, simplifiedTraceback will be ignored?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, simplifiedTraceback is not applicable if hideTraceback is set. Unless caller sets parameter hide_traceback=False to override the config.

I'll update the description to reflect this.

@@ -122,6 +122,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
protected val authSocketTimeout = conf.get(PYTHON_AUTH_SOCKET_TIMEOUT)
private val reuseWorker = conf.get(PYTHON_WORKER_REUSE)
protected val faultHandlerEnabled: Boolean = conf.get(PYTHON_WORKER_FAULTHANLDER_ENABLED)
protected val hideTraceback: Boolean = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf in SQLConf is session-based conf that also can be set in runtime, and any conf in core module or StaticSQLConf is cluster-wide conf and can't be changed while the cluster is running.

@wengh
Copy link
Author

wengh commented Jan 23, 2025

Could you show the example of the error messages, between this is enabled and disabled, in the PR description?

@ueshin

It's in the collapsible section "Example that illustrates the difference"

@ueshin
Copy link
Member

ueshin commented Jan 23, 2025

It's in the collapsible section

ah, I didn't notice that. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants